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ABSTRACT. This paper presents a new exponential lower bound for the two most popular deter- 
ministic variants of the strategy improvement algorithms for solving parity, mean payoff, discounted 
payoff and simple stochastic games. The first variant improves every node in each step maximizing 
the current valuation locally, whereas the second variant computes the globally optimal improve- 
ment in each step. We outline families of games on which both variants require exponentially many 
strategy iterations. 



In this paper, we study lower bounds for strategy improvement algorithms for solving parity games, 
mean payoff games, discounted payoff games as well as simple stochastic games. These are 
two-player games of perfect information played on directed graphs, and are related by a chain of 
polynomial-time reductions. 

Parity games can be reduced to mean payoff games [Pur95 ], mean payoff games to discounted 
payoff games, and the latter ones to simple stochastic games [ZP96]. Solving games of any of these 
classes is one of the few combinatorial problems that belongs to the complexity class NP n coNP and 
that is not (yet) known to belong to P [EJS93, Con92|. It has also been shown that solving parity 
games as well as mean and discounted payoff games belongs to UP n coUP HJur98l . 

We mainly consider parity games in this paper. They are played on a directed graph that is 
partitioned into two node sets associated with the two players; the nodes are labeled with natural 
numbers, called priorities. A play in a parity game is an infinite sequence of nodes whose winner 
is determined by the parity of the highest priority that occurs infinitely often, giving parity games 
their name. 

The reason why parity games seem to be the most appropriate class of games, when trying to 
construct a worst-case family for one of the four classes, is that the effect of each node in a parity 
game is very clear: a higher priority dominates all lower priorities (in a play), no matter how many 
there are. By showing that the strategy iteration on our family of parity games directly corresponds 
to the strategy iteration that solves the other classes of games, we get the lower bounds for these by 
applying the standard reductions to our games. 
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Parity games occur in several fields of theoretical computer science, e.g. as solution to the 
problem of emptiness of tree automata [GTW02J IEJ91I or as algorithmic backend to the model 
checking problem of the modal /i-calculus [EJS93 , Sti95 ]. 

There are many algorithms that solve parity games, such as the recursive decomposing al- 
gorithm due to Zielonka [Zie98] and its recent improvement by Jurdzihski, Paterson and Zwick 
[JPZ06], the small progress measures algorithm due to Jurdzihski HJurOOl with its recent improve- 
ment by Schewe [Sch07], the model-checking algorithm due to Stevens and Stirling [SS98] and fi- 
nally the two strategy improvement algorithms by Voge and Jurdzihski I VJOOH and Schewe [Sch08 ]. 

All mentioned algorithms except for the two deterministic subexponential algorithms [JPZ06, 
ISch071 and except for the two strategy improvement algorithms have been shown to have a super- 
polynomial or exponential worst-case runtime complexity at best [JurOO, FrilOb, FrilOa]. The cur- 
rently best known upper bound on the deterministic solution of parity games is • \V\ 3l rann l) 
due to Schewe's big-step algorithm BSch07l . 

The strategy improvement, strategy iteration or policy iteration technique is the most general 
approach that can be applied as a solving procedure for all of these game classes. It was introduced 
by Howard HHow601 for solving problems on Markov decision processes and has been adapted by 
several other authors for solving nonterminating stochastic games [HK66], simple stochastic games 
HCon92L discounted and mean payoff games [Pur95, ZP96 ] as well as parity games HVJOOi 

Strategy iteration is an algorithmic scheme that is parameterized by an improvement policy 
which basically defines how to select a successor strategy in the iteration process. There are two 
major kinds of improvement policies: deterministic and randomized approaches; we will investigate 
deterministic approaches in this paper. 

For discounted payoff games, there is the deterministic algorithm due to Puri HPur95 1 that can 
also be used to solve mean payoff games as well as parity games by reduction [ZP96 HVJ001 . Voge 
and Jurdzihski's improvement algorithm is a refined version of Puri's on parity games that omits the 
use of high-precision rational numbers; there are at least two reasonable improvement policies for 
the Voge-Jurdzihski procedure appearing in the literature such as the standard locally optimizing 
policy and Schewe's globally optimizing policy. 

An example has been known for some time for which a sufficiently poor choice of a single- 
switch policy causes an exponential number of iterations of the strategy improvement algorithm 
HBV0711 . but there have been no games known so far on which the policies due to Voge/Jurdzihski 
or Schewe require more than linearly many iterations. 

In this paper, we particularly investigate the locally optimizing policy - which is, by far, the 
most natural choice for a multi-switching improvement policy - for solving parity games as it is 
applied by default in the original paper of Voge and Jurdzihski. We present a family of games 
comprising a linear number of nodes and a quadratic number of edges such that the strategy im- 
provement algorithm using this policy requires an exponential number of iterations on them. We 
explain how these games can be refined in such a way that they only comprise a linear number of 
edges resulting in an undeniable exponential lower bound. Additionally, we describe what parts of 
the games have to be altered in order to get a family that results in exponentially many iterations 
when solved by Schewe's strategy improvement algorithm. 

Finally, we show that the parity game strategy iteration on our games directly corresponds to 
the strategy iteration that solves the associated mean payoff, discounted payoff as well as simple 
stochastic games, resulting in an exponential lower bound for the standard strategy improvement 
algorithms for all of these game classes. 

Section [2] defines the basic notions of parity games and some notations that are employed 
throughout the paper. Section[3]recaps the strategy improvement algorithm by Voge and Jurdzihski; 
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we define the two considered improvement policies in Section [4] In Section [5] we define a subclass 
of parity games called sink games that allows us to relate the lower bounds for parity games to the 
other games classes. Section [6] presents a family of games on which the locally improving algo- 
rithm requires an exponential number of iterations. We discuss some improvements of the family in 
Section [7] In Section [8] we consider the modifications that have to be applied to our construction to 
obtain a lower bound for the globally optimizing policy. In Section [9j we show how to transfer the 
lower bounds to mean payoff, discounted payoff and simple stochastic games. 

2. Parity Games 

A parity game is a tuple G = (V, Vo, V\, E, $1) where (V, E) forms a directed graph whose node 
set is partitioned into V = Vq U V\ with Vq D Vi = 0, and Q : V — > N is the priority function that 
assigns to each node a natural number called the priority of the node. We assume the graph to be 
total, i.e. for every v € V there is a w E V s.t. (v, w) E E. 

In the following we will restrict ourselves to finite parity games. W.l.o.g. we assume Q to be 
injective, i.e. there are no two different nodes with the same priority. 

We also use infix notation vEw instead of (v, w) E E and define the set of all successors of 
v as vE := {w \ vEw}. The size \G\ of a parity game G = (V, Vo, V\, E, fi) is defined to be 
the cardinality of E, i.e. |G| := \E\; since we assume parity games to be total w.r.t. E, this is a 
reasonable way to measure the size. 

The game is played between two players called and 1: starting in a node vo E V, they 
construct an infinite path through the graph as follows. If the construction so far has yielded a 
finite sequence vq . . . v n and w„ £ then player i selects a w E v n E and the play continues with 
Vq... v n w. 

Every play has a unique winner given by the parity of the greatest priority that occurs infinitely 
often. The winner of the play vqV\V2 ... is player i iff max{p | Vj E N3k > j : f2(ufc) = p} =% i 
(where % =k j holds iff \i — j\ mod k = 0). That is, player tries to make an even priority occur 
infinitely often without any greater odd priorities occurring infinitely often, player 1 attempts the 
converse. 

We depict parity games as directed graphs where nodes owned by player are drawn as circles 
and nodes owned by player 1 are drawn as rectangles; all nodes are labeled with their respective 
priority, and - if needed - with their name. 

A strategy for player i is a - possibly partial - function a : V*Vi — > V, s.t. for all sequences 
vo ■ ■ ■ v n with Vj+i E vjE for all j = 0, . . . , n — 1, and all v n E Vi we have: a(vo ■ ■ ■ v n ) E v n E. 
A play vqv\ . . . conforms to a strategy a for player i if for all j E N we have: if Vj E Vi then 
Vj+i = a(vo . . .Vj). Intuitively, conforming to a strategy means to always make those choices that 
are prescribed by the strategy. A strategy a for player i is a winning strategy in node v if player i 
wins every play that begins in v and conforms to a. 

A strategy a for player i is called positional if for all vo . . . v n E V* Vi and all wo . . . w m E 
V*Vi we have: if v n = w m then a(vo ■ ■ ■ v n ) = a(wo ■ ■ ■ w m ). That is, the choice of the strategy 
on a finite path only depends on the last node on that path. 

With G we associate two sets Wq, W\ C V; Wi is the set of all nodes v s.t. player i wins the 
game G starting in v. Here we restrict ourselves to positional strategies because it is well-known 
that a player has a (general) winning strategy iff she has a positional winning strategy for a given 
game. In fact, parity games enjoy positional determinacy meaning that for every node v in the game 
either v E Wq or v E W\ IEJ911 . Furthermore, it is not difficult to show that, whenever player i has 
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winning strategies a v for all v G U for some U C.V, then there is also a single strategy a that is 
winning for player i from every node in U. 

The problem of solving a parity game is to compute Wo and W\ as well as corresponding 
winning strategies <7o and <7i for the players on their respective winning regions. 

A strategy a for player i induces a strategy subgame G\ a := (V, Vq, V\, E\ a , fi) where E\ a := 
{(u, v) G E | u G dom(a) =>■ cr(/u) = v}. Such a subgame G| CT is basically the same game as G 
with the restriction that whenever <r provides a strategy decision for a node u G VJ, all transitions 
from u but cr(u) are no longer accessible. The set of strategies for player i is denoted by Si(G). 

3. Strategy Improvement 

We briefly recap the basic definitions of the strategy improvement algorithm. For a given parity 
game G = (V, Vq, V±, E, Q), the reward of node v is defined as follows: rewc(t;) := £l(v) if 
Q(v) =2 and rew(j(v) := — otherwise. The set of even resp. odd priority nodes is defined 
to be V® := {v eV \ Sl(v) = 2 0} resp. V e := {v G V \ Q(v) = 2 1}- 

The relevance ordering < on V is induced by 17: v < u : -4=>- il(y) < Q(u); additionally 
one defines the reward ordering -< on V by u -< u : <^=^ rewc;(i;) < rewc(ti). Note that both 
orderings are total due to injectivity of the priority function. 

Let 7r be a path a be a strategy for player i. We say that it conforms to a iff for every j with 
7r(j) G Fjwe have a{ir{j)) = ir(j + 1). 

Let v be a node, a be a positional player strategy and r be a positional player 1 strategy. 
Starting in v, there is exactly one path tt,j jT:V that conforms to a and r. Since a and r are positional 
strategies, this path can be uniquely written as follows. 

Ka,T,v =V 1 ...V k (w 1 . ..WlY 

with v\ = v, Vi ytz wi for all 1 < i < k and Q(wi) > £l(wj) for all 1 < j < I. Note that the 
uniqueness follows from the fact that all nodes on the cycle have different priorities and we choose 
w\ to be the node with highest priority. 

Discrete strategy improvement relies on a more abstract description of such a play TT ajT:V . In 
fact, we only consider the dominating cycle node w\, the set of more relevant nodes — i.e. all Vi > w\ 
- on the path to the cycle node, and the length k of the path leading to the cycle node. More formally, 
the node valuation ofvw.r.t. a and r is defined as follows. 

"&a,T,v ■= (wi, {vi > wi | 1 < i < k}, k) 

Given a node valuation we refer to w\ as the cycle component, to {vi > w\ \ 1 < % < k} as the 
path component, and to k as the length component of 

In order to compare node valuations with each other, we introduce a total ordering on the set of 
node valuations. For that reason, we need to define a total ordering -< on the second component of 
node valuations - i.e. on subsets of V - first. To compare two different sets M and N of nodes, we 
order all nodes lexicographically w.r.t. to their relevance and consider the first position in which the 
two lexicographically ordered sets differ, i.e. there is a node v G M and a node w G N with v / w 
s.t. u G M iff u G N for all u > v and all u > w. Now N is better than M iff v -< w, i.e. the set 
which gives the higher reward in the first differing position is superior to the other set. 

In other words, to determine which set of nodes is better w.r.t. -<, one considers the node with 
the highest priority that occurs in only one of the two sets. The set owning that node is greater than 
the other if and only if that node has an even priority. More formally: 

M < N : MAN ^ and max(MAiV) G ((N n V®) U (M n V e )) 
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(u,M,e) -< (v,N,f) 



where MAN denotes the symmetric difference of both sets. 

Now we are able to extend the total ordering on sets of nodes to node valuations The motivation 
behind this ordering is a lexicographic measurement of the profitability of a positional play w.r.t. 
player 0: the most prominent part of a positional play is the cycle in which the plays eventually 
stays, and here it is the reward ordering on the dominating cycle node that defines the profitability 
for player 0. The second important part is the loopless path that leads to the dominating cycle node. 
Here, we measure the profitability of a loopless path by a lexicographic ordering on the relevancy of 
the nodes on path, applying the reward ordering on each component in the lexicographic ordering. 
Finally, we consider the length, and the intuition behind the definition is that, assuming we have an 
even-priority dominating cycle node, it is better to reach the cycle fast whereas it is better to stay as 
long as possible out of the cycle otherwise. More formally: 

(u -< v) or (u = v and M -< N) or 
(u = v and M = N and e < f and u G Vq) or 
(u = v and M = N and e > f and u G V@) 

Given a player strategy a, it is our goal to find a best response counterstrategy r that mini- 
mizes the associated node valuations. A strategy r is an optimal counterstrategy w.r.t. a iff for every 
opponent strategy r' and for every node v we have: i9 CTir) „ < d a)T i v . 

It is well-known that an optimal counterstrategy always exists and that it is efficiently com- 
putable. 

Lemma 3.1 ( HVJOOID . Let G be a parity game and a be a player strategy. An optimal counter- 
strategy for player 1 w.r.t. a exists and can be computed in polynomial time. 

A fixed but arbitrary optimal counterstrategy will be denoted by r CT from now on. The associated 
game valuation E a is a map that assigns to each node the node valuation w.r.t. a and r CT : 

Game valuations are used to measure the performance of a strategy of player 0: for a fixed 
strategy a of player and a node v, the associated valuation essentially states which is the worst 
cycle that can be reached from v conforming to a as well as the worst loopless path leading to that 
cycle (also conforming to a). 

We also write v -< a u to compare the H^-valuations of two nodes, i.e. to abbreviate E a (v ) -< 
E a (u). 

A run of the strategy improvement algorithm can be expressed by a sequence of improving 
game valuations; a partial ordering on game valuations is quite naturally defined as follows: 

H < S' : (E(v) ^ E» for all v G V) and (H / S') 

A valuation E a can be used to create a new strategy of player 0. The strategy improvement 
algorithm is only allowed to select new strategy decisions for player occurring in the improvement 
arena Ac, a '■= (V, Vq, V\, E', Q) where 

vE'u : 

vEu and (v G V\ or (v G Vq and a(y) -< c u)) 

Thus all edges performing worse than the current strategy are removed from the game. A 
strategy a is improvable iff there is a node v G Vq, a node u G V with vEu s.t. a(v) -< a u. 

An improvement policy now selects a strategy for player in a given improvement arena. More 
formally: an improvement policy is a map Xq : Sq{G) — > Sq(G) fulfilling the following two 
conditions for every strategy a. 
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(1) For every node v G Vq it holds that (v,Ic(cr)(v)) is an edge in Ac,a- 

(2) If a is improvable then there is a node v G Vo s.t. <t(i>) Zg(c)(v)- 

We say that an edge (v, u) is an improving edge w.r.t. <r iff v G Vb, u G U-E, <7(i>) 7^ -u and 
<t(u) -< ct u. 

Jurdzihski and Voge proved in their work that every strategy that is improved by an improve- 
ment policy can only result in strategies with valuations strictly better than the valuation of the 
original strategy. 

Theorem 3.2 ([VJOO]). Let G be a parity game, a be an improvable strategy andXc be an improve- 
ment policy. We have E a <\ 'Ex G f a y 

If a strategy is not improvable, the strategy iteration procedure comes to an end and the winning 
sets for both players as well as associated winning strategies can be easily derived from the given 
valuation. 

Theorem 3.3 ([VJOO]). Let G be a parity game and a be a non-improvable strategy. Then the 
following holds: 

(1) Wq = {v\ E a (v) = (w, _, _) and w G V®} 

(2) Wi = {v I E a (v) = (w, _, _) and w G V e } 

(3) a is a winning strategy for player on Wq 

(4) T a is a winning strategy for player 1 on W\ 

(5) a is <-optimal 

The strategy iteration starts with an initial strategy lq and runs for a given improvement policy Zq 
as outlined in the pseudo-code of Algorithm [T] 

Algorithm 1 Strategy Iteration 



o- <- lg 

while a is improvable do 

a «- X G {a) 
end while 

return Wq, Wi, a, r as in Theorem|3.3 



4. Improvement Policies 

There are two major deterministic improvement policies that we consider here, namely the locally 
optimizing policy due to Jurdzihski and Voge HVJOOB and the globally optimizing policy by Schewe 
HSch08l . 

The locally optimizing policy Iq° c selects a most profitable strategy decision in every point 
with respect to the current valuation. More formally, it holds for every strategy a, every player 
node v and every w G vE that w ^ a ^g° C (°')(' i; )- 

Lemma 4.1 ( [iVJOOID . The locally optimizing policy can be computed in polynomial time. 

This policy is generally considered to be the most natural choice, particularly because it di- 
rectly corresponds to the canonical versions of strategy iteration in related parts of game theory like 
discounted payoff games or simple stochastic games. We will present a family of games on which 
the algorithm parameterized with this policy requires exponentially many iterations. 
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The globally optimizing policy I G l0 on the other hand computes a globally optimal successor 
strategy in the sense that the associated valuation is the best under all allowed successor strategies. 
More formally, given a parity game G, an improvable strategy a and the improved strategy a* = 
I|°((t), we have for an arbitrary strategy a' in the arena Ac, a that E a > < E a *. 

The policy can be interpreted as providing strategy improvement with a one-step lookahead; it 
computes the optimal strategy under all possible strategies that can be reached by a single improve- 
ment step. 

The interested reader is pointed to Schewe's paper [Sch08] for all the details on how to effec- 
tively compute the optimal strategy update. 

Theorem 4.2 ([Sch08 ]). The globally optimizing policy can be computed in polynomial time. 

We will also explain how to adapt the presented family of games in order to enforce exponen- 
tially many strategy iterations on them when parameterized with the globally optimal policy. 

Voge mentions without proof in his thesis that there is an improvement policy that requires at 
most \V\ many iterations to find its fixed point. We find this fact to be quite remarkable and give a 
short proof of it in the following. 



Lemma 4.3 ([VogOO]). Let G be a parity game. There is an improvement policy Z G s.t. the 



strategy improvement algorithm requires at most \ V\ many iterations. 

Proof. Let G = (V, Vq, V\,E, f2) be a parity game and let a* be a <l-optimal strategy. We define 
the improvement policy I G n as follows. 




if (v,a*(v))eA G ,a 
otherwise 



We will show that l G in is indeed an improvement policy and that the strategy iteration param- 
eterized with Z G in requires at most | Vq | iterations on G in one go by verifying that 

m{a) C Vq => m(a) C m(l G in (a)) 

for all a where m(a) = {v G Vq \ a(v) = a*(v)}. 

Let a be a strategy s.t. m(a) C Vq. Since m(a) C m(I G in (a)) holds by definition, we simply 
need to show that there is at least one node v G Vq with a(v) ^ o~*{v) and (v,a*(v)) S Ag,o- 
Consider the game G' = (V, Vq, Vi, F, O) where 

F = {(v, w) G E | v € V\ or (v,w) € a or (v, w) G a*} 

It is easy to see that Ag' ,a ^ Ac,a and also that a* is a <-optimal strategy w.r.t. G' . As a is 
not optimal, there must be at least one proper improvement edge (v, w) G Ac, a- By definition of 
G', it follows that a(v) / w and a* (v) =w. □ 

One may be misled to combine the existence of an improvement policy Z G in that enforces 
at most linearly many iterations with the existence of the improvement policy l G l0 that selects 
the optimal successor strategy in each iteration, in order to propose that Z G l0 should also enforce 
linearly many iterations in the worst case. 

The reason why this proposition is incorrect lies in the intransitivity of optimality of strategy 
updates. Although it is true that I G in (a) < I G l0 (a) for every strategy a, this is not necessarily the 
case for iterated applications, i.e. I G in (I G 1,1 (a)) < Z G l0 (2 G °(a)) does not necessarily hold for all 
strategies a. 
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5. Sink Games 

Every approach trying to construct a game family of polynomial size that requires super-polynomially 
many iterations to be solved by strategy iteration (no matter which policy the algorithm is parame- 
terized with), needs to focus on the second component of game valuations: there are only linearly 
many different values for the first and third component while there are exponentially many for the 
second. 

Particularly, as there are at most linearly many different cycle nodes that can occur in valuations 
during a run, there is no real benefit in actually using different cycle nodes. Hence our basic layout 
of a game exploiting exponential behavior consists of a complex structure leading to one single 
loop - the only cycle node that will occur in valuations (such structures can easily be identified 
by preprocessing, but obviously it is not very difficult to obfuscate the whole construction without 
really altering its effect on the strategy iteration). In this setting, the strategy iteration algorithm is 
just improving the paths leading to the cycle node. 

More formally: we call a parity game G (in combination with an initial strategy lq) a 1-sink 
game iff the following two properties hold: 

(1) Sink Existence: there is a node v* (called the 1-sink of G) with v*Ev* and Q(v*) = 1 reachable 
from all nodes; also, there is no other node w with Q(w) < Q(v*). 

(2) Sink Seeking: for each player strategy a with H iG < E a and each node w it holds that the cycle 
component of E a (w) equals v*. 

Obviously, a 1-sink game is won by player 1. Note that comparing node valuations in a 1-sink 
game can be reduced to comparing the path components of the respective node valuations, for two 
reasons. First, the cycle component remains constant. Second, the path-length component equals 
the cardinality of the path component, because all nodes except the sink node are more relevant than 
the cycle node itself. In the case of a 1-sink game, we will therefore identify node valuations with 
their path component. 

It is fairly easy to prove that a game is a 1-sink game indeed. One simply has to check that the 
sink existence property holds by looking at the graph, that the game is completely won by player 1 , 
and that the 1-sink is the cycle component of all nodes of the initial strategy. 

Lemma 5.1. Let G be a parity game fulfilling the sink existence property w.r.t. v*. G is a 1-sink 
game iff G is completely won by player 1 (i.e. W\ = V) and for each node w it holds that the cycle 
component ofE tG (w) equals v*. 

Proof. The "only-if'-part is trivial. For the "if'-part, we need to show that the sink seeking-property 
holds. Let a be a player strategy with E tG < w be an arbitrary node and u be the cycle 
component of E a (w). Due to the fact that G is completely won by player 1, u has to be of odd 
priority. Also, since S t<3 < E a , it holds that Q(u) < Q(v*) implying u = v * by the sink existence- 
property. □ 

There is another reason why 1-sink games are an interesting subclass of parity games: we will 
see later that the strategy iteration on a discounted payoff game that has been induced by the canonic 
reduction from a 1-sink parity game, directly corresponds to the strategy iteration on the original 
1-sink parity game. This connection between discounted payoff games and 1-sink parity games 
allows us to directly transfer the lower bound to discounted payoff games. 
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6. Lower Bound for the Locally Optimizing Policy 

The lower bound construction for the locally optimizing policy is a family of 1-sink parity games 
that implement a binary counter. In order to reduce the overall complexity of the games, our con- 
struction relies on unbounded edge outdegree, yielding a quadratic number of edges in total. We 
will discuss in the next section how the number of edges can be reduced to a linear number and even 
how to get binary outdegree. 

The implementation of the binary counter is based on a structure called simple cycles that allows 
us to encode a single bit state in a given strategy a. By having n such simple cycles, we can represent 
every state of an n-bit binary counter. In order to allow strategy improvement the transitions of 
the binary counter, we need to embed the simple cycles in a more complicated structure called 
cycle gadget, connect the cycle gadgets of the different bits with each other, and with an additional 
structure called deceleration lane. 

This section is organized as follows. First, we consider the three gadgets that will be used in our 
lower bound construction, namely simple cycles, the deceleration lane and cycle gates. Then, we 
present the full construction of our lower bound family and give a high-level description of strategy 
iteration on these games. Finally, we prove that strategy iteration on the games indeed follows the 
high-level description. 

For the presentation of the gadgets, we assume the context of a 1-sink parity game. The label- 
ings and priorities of the gadgets will match the final priorities of the lower bound family. 

Gadgets consist of three kinds of nodes: input nodes, output nodes and internal nodes. Input 
nodes are nodes that will have incoming edges from outside of the gadget, output nodes will have 
outgoing edges to the outside of the gadget and internal nodes will not be directly connected to the 
outside of the gadget. 

In the context of 1-sink game G and a strategy a, we will sometimes say that a node v reaches 
a node w to denote the fact that w lies on the path 7r<j T 

6. 1. Simple Cycles. The binary counter will contain a representation of n bits that are realized by 
n instances of a gadget called a cycle gate. The most important part of a cycle gate is the simple 
cycle that we will introduce first. We fix some index % for the simple cycle gadget for the sake of 
this subsection in order to have consistent node labelings. 

A simple cycle consists of one player controlled internal node di that is connected to a set 
of external nodes Di in the rest of the graph, and one player 1 controlled input node e^. The node 
itself is connected to di (therefore the name simple cycle) and to one output node hi Di. We 
note that all nodes are the only player 1 controlled nodes with real choices in the complete lower 
bound construction. 

All priorities of the simple cycle are based on an odd priority p,;. Intuitively, the pi is considered 
to be a very small priority compared to the priorities of the other nodes in the external graph that 
the simple cycle is connected to. We will implicitly assume this in the following. 

See Figure [T] for a simple cycle of index 1 with p\ = 3. The players, priorities and edges are 
described in Table [TJ 




Figure 1 : Simple Cycle 
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Node 


Player 


Priority 


Successors 


di 





Pi 


{ei} U Di 


ei 


1 


Pi + l 


{di, hi} 


hi 


? 


>Pi + l 


? 


w £ Di 


? 


>Pi + l 


? 



Table 1 : Description of the Simple Cycle 



Given a strategy a, we say that the cycle is closed iff a(di) = ei and open otherwise. A closed 
cycle corresponds to a bit which is set while an open cycle corresponds to an unset bit. 

The main idea now is to assign priorities to the simple cycle in such a way that the simple cycle 
is won by player 0, i.e. the most relevant node on the cycle needs to have an even priority. This has 
important consequences for the behaviour of the player 1 controlled node. 

First, assume that a(di) = e^. The optimal counter-strategy here is r CT (ej) = hi, since otherwise 
player would win the cycle which is impossible with G being a 1-sink game. Player is therefore 
able to force player 1 to move out of the cycle; in other words, setting a bit corresponds to forcing 
player 1 out of the cycle. In a set bit, the valuation of di is essentially the valuation of hi, i.e. 
E a (di) = E a (hi) U {di,ei}. 

Second, assume that a(di) = w for some w G Di, and that w -< a hi. It follows that di < a hi, 
hence T CT (ej) = di. The interesting part is now that S CT (ej) = r. a (w) U {di,ei}, i.e. e, is an 
improving node for di (since E a (w) AH CT (ej) = {di, ei}), but updating to ej would yield a much 
greater reward than just S (7 (ej) (namely E a (hi) U {ei} by forcing player 1 to leave the cycle). 

Assume now that w' G Di with w -< CT w' but w' -< a hi. Obviously, w' and e, are improving 
nodes for di, but e$ -< a w' , hence by the locally improving policy, player switches to w' although 
ei might give a much better valuation. In other words, by moving to di, the player 1 node hides the 
fact that there is a highly profitable node on the other side. 

We formalize the behaviour of the simple cycle in two lemmas. The first describes the valuation 
of ei depending on the state of the simple cycle and the second explains the switching behaviour of 
the player controlled node. The claimed result can easily be obtained by tracing the paths that the 
strategies take through the gadget, and then comparing valuations. 

Lemma 6.1. Let a be a strategy. The following holds: 

(1) If cycle i is closed, we have r a (e,i) = hi. 

(2) If cycle i is open and hi -< CT cr(di), we have r CT (ej) = hi. 

(3) If cycle i is open and o~(di) -< a hi, we have T a (ei) = di. 

Lemma 6.2. Let a be a strategy and w = max^ CT D^ Let a' = Z loc (a). The following holds: 

(1) If cycle i is closed and w -< a hi, we have cycle i a' -closed ("closed cycle remains closed"). 

(2) If cycle i is open, a(di) / w or hi ~< a w, we have a'(di) = w ("open cycle remains open"). 

(3) If cycle i is open, o~(di) = w and w -< a hi, then cycle i is a' -closed ("open cycle closes"). 

(4) If cycle i is closed and hi -< a w, we have a'(di) = w ("closed cycle opens"). 

Open simple cycles have the important property that we can postpone closing them by supply- 
ing them with new nodes w G Di in each iteration s.t. a(di) < c w. We will use this property in the 
construction of our binary counter. Since we do not want to set all bits at the same time, rather one 
by one, we need to make sure that unset bits which are not supposed to be set remain unset for some 
time (more precisely, until the respective bit represents the least unset bit), and this will be realized 
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by this property. The device that supplies us with new best-valued external nodes in each iteration 
is called deceleration lane and will be described next. 



6.2. Deceleration Lane. A deceleration lane has several, say m, input nodes and some output 
nodes, called roots. The lower bound construction will only require a deceleration lane with two 
roots s and r, however, it would be easy to generalize the construction of deceleration lanes to an 
arbitrary number of roots. 

More formally, a deceleration lane consists of m internal nodes t\, . . ., t m , one additional inter- 
nal node c, m input nodes a\, . . ., a m and two output nodes s and r, called roots of the deceleration 
lane. 

All priorities of the deceleration lane are based on some odd priority p. We assume that all root 
nodes have a priority greater than p + 2m + 1. See Figure[2]for a deceleration lane with m = 6 and 
p = 15. The players, priorities and edges are described in Table[2] 



Node 


Player 


Priority 


Successors 


h 





P 


{s, r, c} 


ti>i 





p + 2i-2 


{s, r, ti-i} 


c 





p + 2m + 1 


{s, r} 


ai 


1 


p + 2i - 1 


{U} 


s 


? 


> p + 2m + 1 


? 


r 


? 


> p + 2m + 1 


? 



Table 2: Description of the Deceleration Lane 




A deceleration lane serves the following purpose. Assume that one of the output nodes, say r, 
has the better valuation compared to the other root node, and assume further that this setting sustains 
for some iterations. 
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The input nodes, say a±, . . . ,a m , now serve as an entry point, and all reach the best valued root 
- r - by some internal nodes. The valuation ordering of all input nodes depends on the iteration: 
at first, ai has a better valuation than all other input nodes. Then, 02 has a better valuation than all 
other input nodes and so on. 

This process continues until the other output node, say s, has a better valuation than r. Within 
the next iteration, the internal nodes perform a resetting step s.t. all input nodes eventually reach the 
new root node. One iteration after that, a\ has the best valuation compared to all other input nodes 
again. 

In other words, by giving one of the roots, say s, a better valuation than another root, say r, 
it is possible to reset and therefore reuse the lane again. In fact, the lower bound construction will 
use a deceleration lane with two roots s and r, and will employ s only for resetting, i.e. after some 
iterations with r y a s, there will be one iteration with s y a r and right after that again r y a s. 

From an abstract point of view, we describe the state of a deceleration lane by which of the two 
roots is chosen and by how many t j nodes are already moving down to c. Formally, we say that a is 
in deceleration state (x, j) (where x £ {s, r} and 0<j<m + la natural number) iff 

(1) <j(c) = x, 

(2) a(h) = cif j > 1, 

(3) a(ti) = for all 1 < i < j, and 

(4) a(ti) = x for all j < i. 

We say that the deceleration lane is rooted in x if a is in state (x, *), and that the index is i if a is in 
state (*, i). Whenever a strategy a is in state (x, i), we define root(a) = x and ind(a) = i. In this 
case, we say that the strategy is well-behaved. 

We formalize the behaviour of the deceleration lane in two lemmas. The first describes the 
ordering of the valuations of the input nodes depending on the state (x, i) of the deceleration lane: 
(1) if the ordering of the root nodes changes, all input nodes have a worse valuation than the better 
root, and (2) otherwise the best valued input node is Oj_i. The second explains the switching 
behaviour of the player controlled nodes: (1) if the ordering of the root node changes, than the 
whole lane resets, and (2) otherwise the lane assembles further, providing a new best-valued input 
node. 

Lemma 6.3. Let a be a strategy in deceleration state (x, i). Let x denote the other root. Then 

(1) x -<cr x implies aj < a x for all j ("resetting results in unprofitable lane"). 

(2) x < c x implies x -< a -< a . . . -< a a m -< c c -< a a± -< a . . . -< CT ("new best-valued node 
in each iteration "). 

Lemma 6.4. Let a be a strategy that is in deceleration state (x, i). Let x denote the other root. Let 
a' =X l0C (cr). Then 

(1) x -<„ x implies that a' is in state (x, 1) ("lane resets"). 

(2) x < a x implies that a' is in state (x, min(i, m) + 1) ("lane assembles one step at a time"). 

(3) a' is well-behaved ( "always ending up with well-behaved strategies "). 

The main purpose of a deceleration lane is to absorb the update activity of other nodes in such 
a way that wise (i.e. edges that will result in much better valuations after switching and reevalu- 
ating) strategy updates are postponed. Consider a node for instance that has more than one proper 
improving switch; the locally optimizing policy will select the edge with the best valuation to be 
switched. In order to prevent that one particular improving switch is applied for some iterations, 
one can connect the node to the input nodes of the deceleration lane. 
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The particular scenario in which we will use the deceleration lane are simple cycles as described 
in the previous subsection. We will connect the simple cycles encoding the bits of our counter to 
the deceleration lane in such a way, that lower cycles have less edges entering the deceleration 
lane. This construction ensures that lower open cycles (representing unset bits) will close (i.e. set 
the corresponding bit) before higher open cycles (representing higher unset bits) have their turn to 
close. 



6.3. Cycle Gate. The simple cycles will appear in a more complicated gadget, called cycle gate. 
We will have n different cycle gates in the game number n of the lower bound family, hence we 
fix some index i for the cycle gate gadget for the sake of this subsection in order to have consistent 
node labelings. 

Formally, a cycle gate consists of two internal nodes a and h%, two input nodes f. L and gi, and 
two output nodes di and ki. The output node di will be connected to a set of other nodes Di in the 
game graph, and ki to some other set Ki as well. The two nodes di and a form a simple cycle as 
described earlier. 

All priorities of the cycle gate are based on two odd priorities pi and p^. See Figure [5] for a 
cycle gate of index 1 with p[ = 3 and p\ = 33. The players, priorities and edges are described in 
Table gj 



Node 


Player 


Priority 


Successors 


di 





Pi 


{a} u Di 


di 


1 


p'i + 1 


{di, K} 


9i 







{fi, ki} 


kj 





Pi 


Ki 


fi 


1 


Pi + 2 


{ei} 


hi 


1 


Pi + 3 


{h} 



Table 3: Description of the Cycle Gate 




Figure 3: A Cycle Gate (index 1 with p' x = 3 and p\ = 33) 
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The main idea behind a cycle gate is to have a pass-through structure controlled by the simple 
cycle that is either very profitable or quite unprofitable. The pass-through structure of the cycle 
gate has one major input node, named gi, and one major output node, named k{. The input node is 
controlled by player and connected via two paths with the output node; there is a direct edge and 
a longer path leading through the interior of the cycle gate. 

However, the longer path only leads to the output node if the simple cycle, consisting of one 
player node d\ and one player 1 node a, is closed. In this case, it is possible and profitable to 
reach the output node via the internal path; otherwise, this path is not accessible, and hence, the 
input node has to select the unprofitable direct way to reach the output node. 

We will have one additional input node, named /j, that can only access the path leading through 
the interior of the cycle gate, for the following purpose. Assume that the simple cycle has just been 
closed and now the path leading through the interior becomes highly profitable. Hence, the next 
switching event to happen will be the node gi switching from the direct path to the path through 
the interior. However, it will be useful to be able to reach the highly profitable path from some 
parts of the outside graph one iteration before it is accessible via g^. For this reason, we include an 
additional input node fi that immediately accesses the interior path. 

We say that a cycle gate is closed resp. open iff the interior simple cycle is closed resp. open. 
Similarly, we say that a cycle gate is accessed resp. skipped iff the access control node gi moves 
through the interior (a(gi) = fi) resp. directly to k{. 

From an abstract point of view, we describe the state of a cycle gate by a pair (/3j(cx), a>i(a)) E 
{0, l} 2 . The first component describes the state of the simple cycle, and the second component 
gives the state of the access control node. Formally, we have the following. 

(1) /3j(cr) = 1 iff the i-th cycle gate is closed, and 

(2) cti(a) = 1 iff the i-th cycle gate is accessed. 

We formalize the behaviour of the cycle gate in two lemmas. The first describes the valuation 



of all important nodes of the cycle gate, using our knowledge of simple cycles of Lemma 6.1 The 



second explains the switching behaviour of the access control node. The behaviour of the simple 



cycle contained in the cycle gate is described by Lemma 6.2 

Lemma 6.5. Let a be a strategy. Then 

(1) If gate i is open, we have fi -< a a(di). 

(2) If gate i is closed, we have a(ki) -< a fi. 

(3) If gate i is closed and skipped, we have g^ < a f. 

(4) If gate i is accessed, we have fi < a gi. 

(5) If gate i is skipped, we have a(ki) < a g, L . 

Lemma 6.6. Let a be a strategy and a' = X loc {a). 

(1) If gate i is a-closed, then gate i is a' -accessed ("closed gates will be accessed"). 

(2) If gate i is a-open and a(di) -< a hi, then gate i is a' -skipped ("open gates with unprofitable 
exit nodes will be skipped"). 

(3) If gate i is a-open and hi < a cr(di), then gate i is a' -accessed ("open gates with profitable exit 
nodes will be accessed"). 

The last two items of Lemma |6.6| are based on the uniqueness of priorities in the game, implying 
that there are no priorities between fi and h{. 

We will use cycle gates to represent the bit states of a binary counter: unset bits will correspond 
to cycle gates with the state (0, 0), set bits to the state (1, 1). Setting and resetting bits therefore 
traverses more than one phase, more precisely, from (0, 0) over (1, 0) to (1, 1), and from the latter 
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again over (0, 1) to (0, 0). Particularly, it can be observed that the second component of the cycle 
gate states switches one iteration after the first component in both cases. 

6.4. Lower Bound Construction. In this subsection, we provide the complete construction of the 
lower bound family. It essentially consists of a 1-sink x, a deceleration lane of length 2n that is 
connected to the two roots s and r, and n cycle gates. The simple cycles of the cycle gates are 
connected to the roots and to the deceleration lane with the important detail, that lower cycle gates 
have less edges to the deceleration lane. This construction ensures that lower open cycle gates will 
close before higher open cycle gates. 

The output node of a cycle gate is connected to the 1-sink and to the -input nodes of all higher 
cycle gates. The s root node is connected to all /* -input nodes, the r root node is connected to all 
g* -input nodes. 

We now give the formal construction. The games are denoted by G n = (V n , V n ,o, V n ,i, E n , fl n ). 
The sets of nodes are 

V n ■= {x, a, c, r} U {ij, dj | 1 < % < 2n} U {di, e*, g i: h, fo, hi | 1 < i < n} 

The players, priorities and edges are described in Table [4] The game G3 is depicted in Figure [4] 



Node 


Player 


Priority 


Successors 


h 





4n + 3 


{a, r, c} 


U>i 





4n + 2i + 1 


{s, r, 


a, 


1 


4n + 2i + 2 


{ti} 


c 





8n + 4 


{a, r} 


di 





4i + l 


{a, e i; r} U {a,j \ j < 2i + 1} 


e-i 


1 


4i + 2 


{di, hi} 


9i 





M + 4 


{fit hi} 


kj 





8n + U + 7 


{x} U {gj \i < j <n} 


fi 


1 


8n + Ai + 9 


{ei} 


hi 


1 


8n + Ai + 10 


{hi} 


s 





8n + 6 


{fj 1 j < n} U {x} 


r 





8n + 8 


{gj | j < n} U {x} 


X 


1 


1 


{x} 



Table 4: Lower Bound Construction for the Locally Optimizing Policy 



Fact 6.1. The game G n has 10 • n + 4 nodes, 1.5 • n 2 + 20.5 • n + 5 edges and 12 • n + 8 as highest 
priority. In particular, \G n \ = 0{n 2 ). 

As an initial strategy we select the following iQ n . It will correspond to the global counter state 
in which no bit has been set. 

iG n (h) = c iG n (9i) = ki L Gn (* e {*»>i,c, di}) = r t Gn (* G {k h s, r}) = x 

Note that iQ n particularly is well-behaved. Hence, by Lemma [6T4p ]> we know that all strategies that 
will occur in a run of the strategy improvement algorithm will be well-behaved. 




Figure 4: Locally Optimizing Lower Bound Game G3 
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We will see in the next section, how the family G n can be refined in such a way that it only 
comprises a linear number of edges. The reason why we present the games with a quadratic num- 
ber of edges first is that the refined family looks even more confusing and obfuscates the general 
principle. 

Lemma 6.7. Let n > 0. 

(1) The game G n is completely won by player 1. 

(2) x is the 1-sink of G Tl and the cycle component o/*H tG (w) equals xfor all w. 
Proof. Let n > 0. 

(1) Note that the only nodes owned by player 1 with an outdegree greater than 1 are e\,. . .,e n . 
Consider the player 1 strategy r which selects to move to hi from for all i. Now it is the case 
that G n \ T contains exactly one cycle that is eventually reached no matter what player does, 
namely the self-cycle at x which is won by player 1 . 

(2) The self-cycle at x obviously is the 1-sink since it can be reached from all other nodes and has 
the smallest priority 1. Since xEx is the only cycle won by player 1 in G n \ lG , x must be the 
cycle component of each node valuation w.r.t. LQ n . □ 

By Lemma [5T| it follows that G n is a 1-sink game, hence it is safe to identify the valuation of a 
node with its path component from now on. 

6.5. Lower Bound Description and Phases. Here, we describe how the binary counter performs 
the task of counting by strategy improvement. Our games implement a full binary counter in which 
every bit is represented by a simple cycle encapsulated in a cycle gate. An unset bit i corresponds 
to an open simple cycle in cycle gate i, a set bit i corresponds to a closed simple cycle in cycle gate 
i. 

Formally, we represent the bit state of the counter by elements from B n = {0, 1}™. For b = 
(b n , . . . ,bi) G B n , let bi denote the z-th component in b for every i < n, where b n denotes the 
most and b\ denotes the least significant bit. By 6 © 1, we denote the increment of the number 
represented by b by 1. The least resp. greatest bit states are denoted by n resp. l n . We refer to 
the least unset bit by pb := min({n + 1} U {i < n \ bi = 0}), and similarly to the least set bit by 
vb : = min({n + 1} U {i < n \ bi = 1}). 

From the most abstract point of view, our lower bound construction performs counting on B n . 
However, the increment of a global bit state requires more than one strategy iteration, more precisely 
four different phases that will be described next (with one phase of dynamic length). 

Every phase is defined w.r.t. a given global counter state b G B n . Let b G B n be a global bit 
state different from l n . 

An abstract counter performs the increment from b to b © 1 by computing b[pb h-> 1] \j<pb i— > 
0], i.e. by setting bit ph and by resetting all lower bits j<pb. In the context of the games, we start in 
phase 1 corresponding to b, and then proceed to phase 2 and phase 3 corresponding to b[[ib i— > 1], 
from phase 3 to phase 4 corresponding to b[pb i— > I] [j < ph i->- 0], and finally from phase 4 to phase 
1 again. The transition from phase 2 to phase 3 and from phase 4 to phase 1 handles the correction 
of the internal structure connecting the cycles with each other. 

For the sake of this subsection, let a be a strategy and b G B n be a global counter state. All 
phases will be defined w.r.t. a and b G B n . Let a' = I loc (a). 

To keep everything as simple as possible and to be able to prove all the lemmas without con- 
sidering special cases, we will assume that b is different from 0„ and that the two highest bits in b 
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are zero and remain zero, i.e. we will only use the first n — 2 bits for counting. Note however, that 
every bit works as intended in the counter. 

Given a strategy a, we denote the associated simple cycle state (f3 n (a), . . . , /3i(a)) by b a , and 
the associated access state (a n (a), . . . , ai(<r)) by a a . 

Recall that every strategy a occurring will be well-behaved. In addition to the deceleration lane 
and the cycle gates, we have two more structures that are controlled by a strategy a, namely the two 
roots r and s, and the cycle gate output nodes fcj. We write a(r) = i to denote that a(r) = gi, 
and a(r) = n + 1 if a(r) = x; we write a(s) = i to denote that a(s) = /j, and <r(s) = n + 1 if 
a(s) = x; we write a(ki) = j to denote that a (hi) = gj, and <j(ki) = n + 1 if <r(/cj) = x. We 
also use a more compact notation for the strategy decision of dj-nodes of open cycles. We write 
a(di) = j if a{di) = aj. 

Recall that we say that a strategy a is rooted in s or r, if every path in the deceleration lane 
conforming to a eventually exits to s resp. r. Likewise, we say that a has index i if all nodes of 
the deceleration lane with smaller index j < i are moving down the lane by a, and that i is the first 
index which is directly exiting through the root. 

The first phase, called the waiting phase, corresponds to a stable strategy a in which open 
cycles are busy waiting to be closed while the deceleration lane is assembling. Cycle gates that 
correspond to set bits are closed and accessed, while cycle gates of unset bits are open and skipped, 
i.e. b = b a = d{j . The selector nodes k% move to the next higher cycle gate corresponding to a set 
bit, and both roots are connected to the least set bit vb. 

More formally, we say that a is a 6-phase 1 strategy iff all the following conditions hold: 

(1) b = b a = a a , i.e. set bits correspond to closed and accessed cycle gates, while unset bits 
correspond to open and skipped cycle gates, 

(2) root (a) = r, i.e. the strategy is rooted in r, 

(3) a(s) = a(r) = vb, i.e. both roots are connected to the least set bit, 

(4) a(ki) = min({j > i \ b 3 ■ = 1} U {n + 1}), i.e. the selector nodes move to the next set bit, 

(5) ind(o) < 2[ib + 2, i.e. the deceleration lane has not passed the least unset bit, and 

(6) cr(dj) / ind(o) — 1 for all j with bj = 0, i.e. every open cycle node is not connected to the 
best- valued node of the lane. 

The only improving switches in the first phase are edges of open simple cycles and edges of the 
deceleration lane. 

Lemma 6.8. Let a be a b-phase 1 strategy with ind{o) < 2pb + 2. Then a' is a b-phase 1 strategy 
with ind{o') = ind(o) + 1, and if ind(o) > 1, then o'{d^) = ind(cr) — 1. 

Proof. Let a be a 6-phase 1 strategy, S := E a and ind(a) < 2[ib + 2. 

We first compute the valuations for all those nodes directly that do not involve any complicated 
strategy decision of player 1. Obviously, H(x) = 0. By Lemma 6.1p ") we know that for all set bits 
i (i.e. bi = 1) we have the following. 

E( ei ) = {e l }UE(h i ) E(di) = {ei,di}UE(hi) S(/<) = {e h ft} U EQu) 

Using these equations, we are able to compute many other valuations that do not involve any com- 
plicated strategy decision of player 1. Let Uj = {gj, fj,ej,hj,kj}. The following holds (by CF P (A) 
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we denote the function that returns A if p holds and otherwise): 

H(fci) = {h} U \J{Uj | j>i, bj=l} E(hi) = {hi, h} U \J{Uj | j>i, b j= l} 

S(ffi) = {ft, fci} U (J^- I J > *, b, = 1} S(r) = {r} U |J{^ | 6,- = 1} 

3(a) = {*} U CF^ojIJi^ I b J = X > \ i^}) S(c) = {c, r} U | = 1} 

Efe) = {ti} U H(r) U CF i<ind((T) ({t 3 - | j < i} U {c}) S(oi) = {oj} U Efo) 

It is easy to see that we have the following orderings on the nodes specified above. 

s < a r ^ a a* -< a h* (a) 

By Lemma [6TT| [2|), it follows from (|a]) that r CT (ei) = dj for all unset bits i (i.e. bi = 0), hence we are 
able to compute the valuations of the remaining nodes. 

H(ei) = {ej U 3(dj) S(/i) = {e i; /;} U E(dj) 

This completes the valuation of H for all nodes. 

It is easy to see that for every i with bi = and every j with = 1 s.t. there is no i < i' < j 
with bi> = 1, the following holds: 

-<a /j Si <a 9j (b) 

Also, for i > j with h = 1 and 6j = 1 we have 

/j <a fj 9i <a 9j (C) 

By ([a]) and Lemma 63p ) we obtain that the following holds: 

a ind(a) <a ■ ■ ■ <a «2n <ct <cr ■ ■ ■ <a a ind{cr)-l (d) 

We are now ready to prove that a' is of the desired form. 

([j} By Lemma 6.2( 1} and ([a]) we derive that closed cycles remain closed. By Lemma 6.6( 1} we 
derive that closed cycles remain accessed. By di]) and Lemma |6T6] |2")) we derive that open cycles 
remain skipped. 

By phase 1 condition d3]>, phase 1 condition ( |6}, dH l, it follows that for every j with bj = 0, 
there is an improving node a* for dj. By Lemma [6T2| 2]), we conclude that open cycles remain 
open. 



By (a 
By (b 
(ffl By (b 



and Lemma|6.4U2 



By Lemma 
By Lemma 



(c 






(C 






6 


4 


2) 


6.4 


2) 


i 6.4 


2 



If ind{a) > 1, then we have by dab and ddj) that o-'(d^) = ind(a) — 1). 



□ 
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The first phase ends, when a simple cycle corresponding to an unset bit has no more edges 
leading to the deceleration lane that keeps it busy waiting, and closes. Since lower bits have less 
edges going to the lane, it is clear that this will be the least unset bit fib. 

The second phase, called the set phase, corresponds to a strategy a in which the least unset bit 
has just been set, i.e. to the global state b[fib \- > 1] = bo-. The selector nodes and roots are as in 
phase 1 and also the access states, i.e. b = a a . 

More formally, we say that a is a 6-phase 2 strategy iff all the following conditions hold: 

(1) b[/j,b i y lj == b(j and b = a a , i.e. set bits correspond to closed and accessed (for all set bits 
except for fib) cycle gates, while unset bits correspond to open and skipped cycle gates, 

(2) root (a) = r, i.e. the strategy is rooted in r, 

(3) cr(s) = <j(r) = vb, i.e. both roots are connected to the former least set bit, 

(4) a(ki) = min({j > i \ b 3 ■ = 1} U {n + 1}), i.e. the selector nodes move to the next set bit, 

(5) ind(a) < 2fib + 3, i.e. the deceleration lane has not passed the next bit, and 

(6) o-(dj) ytz ind(a) — 1 for all j > fib with bj = 0, i.e. every higher open cycle node is not 
connected to the best-valued node of the lane. 

Lemma 6.9. Let a be a b-phase 1 strategy with ind(a) = 2fib + 2 and o~(d^) = ind(a). Then a' 
is a b-phase 2 strategy. 



Proof. This can be shown essentially the same way as Lemma 6.8 The only difference now is that 
d^b has no more improving switches to the deceleration lane and hence, by Lemma [6T2fl 3]), we learn 
that the /i6-cycle has to close. □ 

In phase 2, the deceleration lane is still assembling, and the improving switches again include 
edges of open simple cycles and edges of the deceleration lane. Additionally, it is improving for the 
cycle gate fib to be accessed and for the root s to update to cycle gate fib. By performing all these 
switches, we enter phase three. 

The third phase, called the access phase, is defined by a renewed correspondence of the cycle 
gate structure again, i.e. b[fib i->- 1] = b a = a a . The s root is connected to fib while r is still 
connected to vb. This implies that s now has a much better valuation than r. 
More formally, we say that a is a 6-phase 3 strategy iff all the following conditions hold: 

(1) b[fib i y lj — — b(j —— CL(j^ i.e. set bits correspond to closed and accessed cycle gates, while unset 
bits correspond to open and skipped cycle gates, 

(2) root (a) = r, i.e. the strategy is rooted in r, 

(3) a(s) = fib and a(r) = vb, i.e. one root is connected to the new set bit and the other one is still 
connected to the former least set bit, 

(4) o-(ki) = mm({j > i \ b 3 ■ = 1} U {n + 1}), i.e. the selectors move to the former next set bit, 

(5) cr(dj) 7^ s for all j > fib with bj = 0, i.e. every higher open cycle node is not connected to the 
best- valued root node. 

Lemma 6.10. Let a be a b-phase 2 strategy. Then a' is a b-phase 3 strategy. 



Proof. Again, this can be shown essentially as the previous Lemmas 6.8 and 6.9 The main differ- 
ence is that now fi -< a for all i ^ fib which is why o"'(s) = fib, and that by Lemma 6.6p "l) we 



have that the fib-th gate is cr'-accessed. □ 

The cycle gate with the best valuation is now fib, hence, there are many improving switches, 
that eventually lead to cycle gate fib. First, there are all nodes of the deceleration lane that have 
improving switches to s. Second, r has an improving switch to fib. Third, lower closed cycles (all 
lower cycles should be closed!) have an improving switch to fib (opening them again). Fourth, all 
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lower selector nodes have an improving switch to fib. By performing all these switches, we enter 
phase four. 

The fourth and last phase, called the reset phase, corresponds to a strategy a that performed the 
full increment, i.e. b a = b © 1. However, the access states are not reset, i.e. a a = b[fib h-> 1] and the 
deceleration lane is moving to root s. 

More formally, we say that a is a 6-phase 4 strategy iff all the following conditions hold: 

(1) b ffi 1 = 6(j and b\pb \-t 1] = a a , i.e. set bits correspond to closed and accessed cycle gates, 
while unset bits correspond to open and skipped (> fib) resp. accessed (< fib) cycles gates, 

(2) root (a) = s, i.e. the strategy is rooted in s, 

(3) a(s) = <j(r) = fib, i.e. both roots are connected to the new set bit, 

(4) a(ki) = mm({j > i \ (b © l)j = 1} U {n + 1}), i.e. the selectors move to the new next set bit, 

(5) ind(a) = 0, i.e. the deceleration lane has reset, and 

(6) o~(dj) = s for all j with (6 © l)j = 0, i.e. every open cycle node is connected to the s root. 

Lemma 6.11. Let a be a b-phase 3 strategy. Then a' is a b-phase 4 strategy. 

Proof. Let a be a 6-phase 3 strategy, H := E a and b' := b\pb \-¥ 1]. 

We first compute the valuations for all those nodes directly that do not involve any complicated 
strategy decision of player 1. Obviously, H(x) = 0. By Lemma |6TT] [T) we know that for all set bits 
i (i.e. b' { = 1) we have the following. 

E(e i ) = {e i }UE(h i ) S(ck) = {e h di} U H(^) -(/;) = {e,, / f } U 3 (hi) 

Using these equations, we are able to compute many other valuations that do not involve any 
complicated strategy decision of player 1. Let Uj = {gj,fj,ej, hj, kj}. 

3(fci) = {A*} U \J{Uj | j>i, bj=l} E(hi) = {hi, h} U \J{Uj \ j>i, b j= l} 

3(ft) = {9i, h} U \J{Uj \j>i, bj=l} U CF i=/ , b Ui S(r) = {r} U \J{Uj \ b j= l} 
E(s) = {s} U {J{Uj | j>fjb, &;•=!} \ {g, b } E(c) = {c, r} U \J{Uj \ bj = 1} 

E(ti) = {U} U 3(r) U CF i<md((T) ({^ | j < i} U {c}) E{cn) = {a,} U E(t t ) 

We have the following orderings on the nodes specified above. 

r -< a a* -< a /i*< / ife -<o- s -< CT /i*>/^6 (a) 

Note that the last inequality s -< a ^i>«6 holds for the following reason: If i corresponds to a 
set bit, then the path from s eventually reaches the node hi, but the highest priority on the way to hi 
is fi, which is odd. If i on the other hand corresponds to an unset bit, then path from s to the sink 
shares the common postfix with hi, which starts with the node a(ki). Comparing the two differing 
prefixes shows that the most significant difference is hi itself, which is even. 

By Lemma 6.1p ), it follows from (|aj) that T a {ei) = di for all unset bits i (i.e. b\ = 0), hence 



we are able to compute the valuations of the remaining nodes. 

H(ei) = {ej U E{di) E(fi) = {e h fi} U S(^) 

It is easy to see that for every i with (6 © l)j = and every j with (6 © l)j = 1 s.t. there is no 
i < i' < j with (b © 1),;/ = 1, the following holds: 

fi <a fj 9i <a 9j (b) 
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Similarly, for i > j with (b © l)j = 1 and (b © l)j = 1 we have 

fi -<a fj 9i <a 9j (C) 

We are now ready to prove that a' is of the desired form. 

([lj By Le mma |6.2p ]) and (|a]) we derive that closed cycles with index i > fib remain closed. By 
Lemma 6.2 14]> and ([a]) we derive that closed cycles with index i < fib open. By Lemma 6.6 T]> 
we derive that closed cycles remain accessed. By ^ and Lemma |6T6| 2|) we derive that open 
cycles remain skipped. 

By phase 3 condition ( |5|) an d (jaj), it follows that for every j with bj = 0, there is the improving 
node s for dj. By Lemma 6T2~fl 2), we conclude that open cycles remain open. 



By (ai and Lemma|6.4|[T 
By (b) and dc 
By (b) and (c 
By Lemma |6*4pj). 
By (fa]> and Lemma [6T2p]> 



□ 



By switching the lane back to the initial configuration and the access states to match the simple 
cycles states, we end up in phase 1 again that corresponds to the incremented global counter state. 

Lemma 6.12. Let a be a b-phase 4 strategy and 6 © 1 ^ 1„. Then a' is a b © 1-phase 1 strategy 
with ind(a') = 1. 

Proof. Let a be a 6-phase 4 strategy, S := E a and b' = b © 1. 

We first compute the valuations for all those nodes directly that do not involve any complicated 
strategy decision of player 1. Obviously, = 0. By Lemma 6.l| T) we know that for all set bits 
i (i.e. b[ = 1) we have the following. 

S(ei) = {e;} U E(hi) S(dj) = {ej, di} U 3(hi) S(/;) = {e h fi} U H(^) 

Using these equations, we are able to compute many other valuations that do not involve any com- 
plicated strategy decision of player 1. Let Uj = {gj, fj, ej, hj, kj}. The following holds: 

E(fe) = {hi} U \J{Uj | j>i, b' J= l} E(hi) = {hi, ki} U \J{Uj | j>i, 6^=1} 



S(r) = {r} U \J{Uj | b'j = 1} 

{c^uU^l^l} 
(ai) = {ai,U} U H(s) 



a c 



S( S ) = { S }U(U{^|6^ = 1}\{^ 6 }) 
H(ti) = {ti}UH(s) 



Additionally for all i > we have: 

3(&i) = {ffi, ^} U (J{Uj \j>i, b'j = 1} 
It is easy to see that we have the following orderings on the nodes specified above. 

s -<o- a* -< a r -< a h* (a) 

By Lemma [6TT| 2|), it follows from (|a|) that r CT (ej) = di for all unset bits i (i.e. b\ = 0), hence we are 
able to compute the valuations of the remaining nodes. 

E(di) = {di}UE(s) E(ei) = {ei,di}UE(s) E(fi) = {f i: e i; dj} U E(s) 

Additionally for all i < fib, we have: 

= {9i, fi, &i, di] U E(s) 
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This completes the valuation of 5 for all nodes. 

It is easy to see that for every i with b' { = and every j with b'j = 1 s.t. there is no i < %' < j 
with b\, = 1, the following holds: 

fi <a fj 9i <a 9j (b) 
Similarly, for i > j with b' { = 1 and b'j = 1 we have 

fi <a fj 9i <a 9j (c) 



We are now ready to prove that a' is of the desired form. 

dT]) By Lemma 6.2[ T|) and di]) we derive that closed cycles remain closed. By Lemma 6.6( 1} we 
derive that closed cycles remain accessed. By dS} and Lemma 6.6p I we derive that open cycles 
remain or will be skipped. 

By Lemma [6i2p ]> and Q, we conclude that open cycles remain open. 

( 



By ( 
By (|b 
<I4J> By db 



and Lemma|6.4||2 
) and ( c 
) and dc 



By Lemma [Mp]) it follows that ind(a') = 1. 

By ([all it follows that a'(di) = r for every i with b\ = 0. 



□ 



6.6. Lower Bound Proof. Finally, we are ready to prove that our family of games really imple- 
ments a binary counter. From Lemmas 6.8| 6.9| 6.10 6.11 and 6.12| we immediately derive the 
following. 

Lemma 6.13. Let a be a phase 1 strategy and b a ^ l n . There is some k > 4 s.t. a' = (Z loc ^ k (a) 
is a phase 1 strategy and b a i =b a ®l. 

Particularly, we conclude that strategy improvement with the locally optimizing policy requires 
exponentially many iterations on G n . 

Theorem 6.14. Let n > 0. The Strategy Improvement Algorithm with the X 100 -policy requires at 
least 2 n improvement steps on G n starting with lq u . 



6.7. Remarks. One could conjecture that 1-sink games form a "degenerate" class of parity games 
as they are always won by player 1. Remember that the problem of solving parity games is to 
determine the complete winning sets for both players. Given a strategy a of player 0, we know by 
Theorem |3.3| that both winning sets can be directly inferred if a is the optimal strategy. But it is 
also possible to derive some information about player 0's winning set given a non-optimal strategy. 
More precisely, Wq 5 {v \ ) = (w, _) and w G V®}. 

In other words: Is there a family of games on which the strategy improvement algorithm re- 
quires exponentially many iterations to find a player strategy that wins at least one node in the 
game? 

The answer to this question is positive. Simply take our lower bound games G n and remove the 
edge from e n to h n . Remember that the first time player 1 wants to use this edge by best response 
is when the binary counter is about to flip bit n, i.e. after it processed 2™ _1 many counting steps. 
Eventually, the player strategy is updated s.t. a(d n ) = e n , forcing player 1 by best response 
to move to h n . Removing this edge leaves player 1 no choice but to stay in the cycle which is 
dominated by player 0. 
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7. Improving the Lower Bound Construction 

We briefly address two improvements of our construction. First, we explain how to reduce the 
number of edges s.t. the overall size of the games is linear in n. Second, we describe how to obtain 
a lower bound construction with binary edge outdegree. 

7.1. Linear Number of Edges. Consider the lower bound construction again. It consists of a 
deceleration lane, cycle gates, two roots and connectives between these structures. All three kinds 
of structures only have linearly many edges when considered on their own. The quadratic number 
of edges is solely due to the d* -nodes of the simple cycles of the cycle gates that are connected to 
the deceleration lane and due to the -nodes of the cycle gates that are connected to all higher cycle 
gates. 

We focus on the edges connecting the d* -nodes with the deceleration lane first. Their purpose 
is twofold: lower cycle gates have less edges to the deceleration lane (so they close first), and as 
long as an open cycle gate should be prevented from closing, there must be a directly accessible 
lane input node in every iteration with a better valuation than the currently chosen lane input node. 

Instead of connecting di to all a-, with j < 2i + 1 nodes, it would suffice to connect di to 
two intermediate nodes, say yi and Z{, that are controlled by player with negligible priorities. We 
connect Zi to all aj with even j < 2i+l and yi to all aj with odd j < 2i+l. By this construction, we 
shift the "busy updating"-part alternately to yi and Zi, and di remains updating as well by switching 
from yi to Zi and vice versa in every iteration. 

Next, we observe that the edges connecting yi (resp. z\) to the lane are a proper subset of the 
edges connecting yi + \ (resp. Zj+i) to the lane, and hence we adapt our construction in the following 
way. Instead of connecting yi + \ (and similarly Zi+i) to all aj with even j < 2i + 3, we simply 
connect yi + \ to a,2i+i and to y^. In order to ensure proper resetting of the two intermediate lanes 
constituted by y* and in concordance with the original deceleration, we need to connect every 
additional node to c. See Figure [5] for the construction (note that by introducing new nodes with 
"negligible priorities", we simply shift all other priorities in the game). 

Second, we consider the edges connecting lower cycle gates with higher cycle gates. As the set 
of edges connecting ki + \ with higher gj is a proper subset of ki, we can apply a similar construction 
by attaching an additional lane to cycle gate connections that subsumes shared edges. 

7.2. Binary Outdegree. Every parity game can be linear-time reduced to an equivalent (in the 
sense that winning sets and strategies can be easily related to winning sets and strategies in the 
original game) parity game with an edge outdegree bounded by two. See Figure [6] for an example 
of such a transformation. 

However, not every such transformation that can be applied to our construction (for clarity of 
presentation, we start with our original construction again) yields games on which strategy iteration 
still requires an exponential number of iterations. We discuss the necessary transformations for 
every player controlled node in the following, although we omit the exact priorities of additional 
helper nodes. It suffices to assign arbitrary even priorities to the additional nodes that lie below the 
priorities of all other nodes of the original game (except for the 1-sink). 

First, we consider the two root nodes s and r, that are connected to the 1-sink x and to f\,. . ., 
f n resp. g\,. . .,g n . As r copies the decision (see the transition from the access to the reset phase) of 
s, it suffices to describe how the outdegree-two transformation is to be applied to s. We introduce 
n additional helper nodes s[,. . .,s' n , replace the outgoing edges of s by x and s' n , connect s' i+1 with 
fi + i and s[, and finally s' x simply with f\. 
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Figure 5: Intermediate Layer 




Figure 6: Binary Outdegree Transformation 



It is still possible to show that s reaches the best valued /, after one iteration. Assume that s 
currently reaches some cycle gate i via the ladder that is given by the helper nodes. Let j be the 
next best- valued cycle gate that just has been set. If j > i, it follows that s currently reaches s'- that 
moves to s'j_ v but updates within one iteration to fj. If j < i, it must be the case that j = 1 (i 
is the least bit which was set; j is the least bit which was unset). Moreover, s currently reaches 
that moves to /«. All lower s' k+1 with k + 1 < i move to s' k since lower unset cycle gates are more 
profitable than higher unset cycle gates (unset cycle gates eventually reach one of the roots via the 
unprofitable /* nodes). Hence, s[ updates within one iteration to s' i _ 1 . 

Second, there are the output nodes of cycle gates k±,. . ., k n . We apply a very similar ladder- 
style construction here. For every ki, we introduce n — i additional helper nodes k^ ■ with i < j < n, 
replace the outgoing edges of ki by x and k\ connect k[ ■ with gj and k\ - +1 (if j < n). The 
argument why this construction suffices runs similarly as for the root nodes. 

Third, there are the nodes t\,. . -Mn of the deceleration lane that are connected to three nodes. 
Again, we introduce an additional helper node t\ for every t{, and replace the two edges to r and 
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U-i resp. c by an edge to t\ that is connected to r and t\-\ resp. c instead. It is not hard to see that 
this slightly modified deceleration lane still provides the same functionality. 

Finally, there are the player controlled nodes d±,. . .,d n of the simple cycles of the cycle gates. 
Essentially, two transformations are possible here. Both replace di by as many helper nodes d\ x as 
there are edges from di to any other node x but a. Then, every d\ x is connected to the target node 
x. 

The first possible transformation connects every d\ x with ei and vice versa, yielding a mul- 
ticycle with e« as the center of each cycle. The second possible transformation connects with 
the first d' ixi , d' ixi with d' iX2 etc. and the last d\ again with e,- L , yielding one large cycle. Both 
replacements behave exactly as the original simple cycle. 

The transformation described here results in a quadratic number of nodes since we started with 
a game with a quadratic number of edges. We note, however, that a similar transformation can 
be applied to the version of the game with linearly many edges, resulting in a game with binary 
outdegree of linear size. 

8. Lower Bound for the Globally Optimizing Policy 

The lower bound construction for the globally optimizing policy again is a family of 1-sink parity 
games that implement a binary counter by a combination of a (modified) deceleration lane and a 
chain of (modified) cycle gates 

This section is organized as follows. First, we discuss the modifications of the deceleration lane 
and the cycle gates and why they are required to obtain a lower bound for the globally optimizing 
policy. Then, we present the full construction along with some remarks to the correctness. 

The main difference between the locally optimizing policy and the globally optimizing policy 
is that the latter takes cross-effects of improving switches into account. It is aware of the impact of 
any combination of profitable edges, in contrast to the locally optimizing policy that only sees the 
local valuations, but not the effects. 

One primary example that separates both policies are the simple cycles of the previous sections: 
the locally optimizing policy sees that closing a cycle is an improvement, but not that the actual 
profitability of closing a cycle is much higher than updating to another node of the deceleration 
lane. 

The globally optimizing policy, on the other hand, is well aware of the profitability of closing 
the cycle in one step. In some sense, the policy has the ability of a one-step lookahead. However, 
our lower bound for the globally optimizing policy is not so different from the original construction 
- the trick is to hide very profitable choices by structures that cannot be solved by a single strategy 
iteration. In other words, we simply need to replace the gadgets that can be solved with a one-step 
lookahead by slightly more complicated variations that cannot be solved within one iteration and 
that maintain this property for as long as it is necessary. 

8.1. Modified Deceleration Lane. The modified deceleration lane looks almost the same as the 
original deceleration lane. It has again several, say m, input nodes a±, . . . , a m along with some spe- 
cial input node c. We have two output wots, r and s, this time with a slightly different connotation. 
We call r the default root and s the reset root. 

More formally, a modified deceleration lane consists of m (in our case, m will be 6 • n — 2) 
internal nodes t±, . . ., t m , m input nodes a\, . . ., a m , one additional input node c, the default root 
output node r and the reset root output node s. 
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All priorities of the modified deceleration lane are based on some odd priority p. We assume 
that all root nodes have a priority greater than p + 2m + 1. The structural difference between the 
modified deceleration lane and the original one is that the lane base c only has one outgoing edge 
leading to the default root r. See Figure [7] for a deceleration lane with m = 5 and p = 27. The 
players, priorities and edges are described in Table|5] 



Node 


Player 


Priority 


Successors 


h 





p 


{s, r, c} 


U>i 





p + 2i-2 


{s, r, ti-i} 


c 


1 


p + 2m + 1 


{r} 


a; 


1 


p + 2i - 1 


{u} 


s 


? 


> p + 2m + 1 


? 


r 


? 


> p + 2m + 1 


? 



Table 5: Description of the Modified Deceleration Lane 
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Figure 7: A Modified Deceleration Lane (with m = 5 and p = 27) 



The intuition behind the two roots is the same as before. The default root r serves as an entry 
point to the cycle gate structure and the reset root s is only used for a short time to reset the whole 
deceleration lane structure. 

We describe the state of a modified deceleration lane again by a tuple specifying which root has 
been chosen and by how many ti nodes are already moving down to c. Formally, we say that a is in 
deceleration state (x, j) (where x G {s, r} and 0<j<m + la natural number) iff 

(1) a(h) = cifj>l, 

(2) cj(ti) = ti-i for all 1 < i < j, and 

(3) cr(ij) = x for all j < i. 

The modified deceleration lane treats the two roots differently. If the currently best- valued root 
is the reset root, it is the optimal choice for all t*- nodes to directly move to the reset root. In other 
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words, no matter what state the deceleration lane is currently in, if the reset root provides the best 
valuation, it requires exactly one improvement step to reach the optimal setting. 

If the currently best- valued root is the default root, however, it is profitable to reach the root via 
the lane base c. The globally optimizing policy behaves in this case just like the locally optimizing 
policy, because the deceleration lane has exactly one improving switch at a time which is also 
globally profitable. 

The following lemma formalizes the intuitive description of the deceleration lane's behaviour: 
a change in the ordering of the root valuations leads to a reset of the deceleration lane, otherwise 
the lane continues to align its edges to eventually reach the best- valued root node via c. 

It is notable that resetting the lane by an external event (i.e. by giving s a better valuation than 
r) is a bit more difficult than in the case of the locally optimizing policy. Let a be a strategy and 
a' = XS 1o ((t). Assume that the current state of the deceleration lane is (r, i) and now we have that 
s has a better valuation than r, i.e. s y a r. Assume further - which for instance applies to our 
original lower bound construction - that the next strategy a' assigns a better valuation to r again, 
i.e. r >- a i s. Therefore, it would not be the globally optimal choice to reset the deceleration lane to 
s, but instead just to keep the original root r. 

In other words, the globally optimizing policy refrains from resetting the lane if the resetting 
event persists for only one iteration. The solution to fool the policy, however, is not too difficult: we 
just alter our construction in such a way that the resetting root will have a better valuation than the 
default root for two iterations. 

Lemma 8.1. Let a be a strategy that is in deceleration state (x, i). Let x denote the other root. Let 
a' =XS 1o (cj). Then 

(1) r y a s, x = r implies that a' is in state (r, min(m, i) + 1). 

(2) x >- a x and x x implies that a' is in state (x, 1). 

The purpose of the modified deceleration lane is exactly the same as before: we absorb the 
update activity of cyclic structures that represent the counting bits of the lower bound construction. 

8.2. Stubborn Cycles. With the locally optimizing policy, we employed simple cycles and hid the 
fact that the improving edge leading into the simple cycle results in a much better valuation than 
updating to the next best-valued node of the deceleration lane. 

However, simple cycles do not suffice to fool the globally optimizing policy. If it is possible to 
close the cycle within one iteration, the policy sees that closing the cycle is much more profitable 
than updating to the deceleration lane. 

The solution to this problem is to replace the simple cycle structure by a cycle consisting of 
more than one player node s.t. it is impossible to close the cycle within one iteration. More 
precisely, we use a structure consisting again of one player 1 node e and three player nodes d , 
d 2 and d 3 , called stubborn cycle. We connect all four nodes with each other in such a way that they 
form a cycle, and connect all player nodes with the deceleration lane. See Figure[8]for an example 
of such a situation. 

More precisely, we connect the player nodes in a round robin manner to the deceleration 
lane, for instance d 1 to 03, a6, . . ., d 2 to 02, as, . . ., and d 3 to ai, 04, . . .. We assume that it is more 
profitable for player 1 to move into the cyclic structure as long as it is not closed. 

Now let a be a strategy s.t. a is in state (r, 6) and o~(d l ) = 03, a(d 2 ) = d 3 and a(d 3 ) = a^. 
There are exactly two improving switches here: d 2 to 05 (which is the best-valued deceleration 
node) and d 1 to d 2 (because d 2 currently reaches 04 via d 3 which has a better valuation than 03). In 
fact, the combination of both switches is the optimal choice. 
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Figure 8: A Stubborn Cycle 



A close observation reveals that the improved strategy has essentially the same structure as the 
original strategy a: two nodes leave the stubborn cycle to the deceleration lane and one node moves 
into the stubborn cycle. By this construction, we can ensure that cycles are not closed within one 
iteration. In other words, the global policy makes no progress towards closing the cycle (it switches 
one edge towards the cycle, and one edge away from the cycle, leaving it in the exact same position). 

8.3. Modified Cycle Gate. We again use a slightly modified version of the cycle gates as a pass- 
through structure that is either very profitable or quite unprofitable. Essentially, we apply two 
modifications. First, we replace the simple cycle by a stubborn cycle, for the reasons outlined in the 
previous subsection. Second, we put an additional player controlled internal node ?/j between the 
input node gi and the internal node /j. It will delay the update of gi to move to the stubborn cycle 
after closing the cycle by one iteration. By this, we ensure that the modified deceleration lane will 
have enough time to reset itself. 

Formally, a modified cycle gate consists of three internal nodes a, hi and m, two input nodes 
fi and gi, and four output nodes dj, df, df and k{. The output node dj (resp. df and df) will be 
connected to a set of other nodes Dj (resp. Df and Df) in the game graph, and ki to some set Ki. 

All priorities of the cycle gate are based on two odd priorities pi and p^. See Figure [9] for a 
cycle gate of index 1 with p' x = 3 and p\ = 65. The players, priorities and edges are described in 
Tabled 



Node 


Player 


Priority 


Successors 


4 





Pi 


{df} U Dj 


df 





p[ + 2 


K 3 } u Df 







p[ + A 




ei 


f 


p'i + 5 


{d\, hi} 


Vi 





p'i + 6 


{fii ki} 


9i 





P'i + 1 


{yi, h} 


kj 





Pi 




fi 


1 


Pi + 2 




hi 


1 


Pi + 3 





Table 6: Description of the Modified Cycle Gate 
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d?:3 



4 



ei : 8 




hi : 68 









ki : 65 



/i :67 








\ 



2/1 : 9 



[9i ■ 10 



Figure 9: A Modified Cycle Gate (index 1 with p\ = 3 and pi = 65) 



From an abstract point of view, we describe the state of a modified cycle gate again by a pair 
(/3j(cr), ai(a)) 6 {0, 1, 2, 3} x {0, 1, 2}. The first component describes the state of the stubborn 
cycle, counting the number of edges pointing into the cycle, and the second component gives the 
state of the two access control nodes. Formally, we have the following. 



= |K I M) * D tt\ 



2 if a{gi) = y { 

if a(gi) = o{yi) = ki 

1 otherwise 



The behaviour is formalized in terms of modified cycle gate states as follows. Intuitively, it functions 
as the original cycle gates: if the cycle is cr-closed and remains closed, it is profitable to go through 
the cycle gate. If the cycle opens by some external event and remains open, it is more profitable to 
directly move to the output node instead. 

Lemma 8.2. Let a be a strategy and a' = X gl0 (cr). 

(1) IfPi(a) = f3i(a') = 3, we have ai(a') = min(aj(cr) + 1, 2) ( "closed gates will be successively 
accessed"). 

(2) If (3i(a) < 3, Pi(a') < 3 and cr(ki) >-„> fi, we have ai(a') = ("open gates with unprofitable 
exit nodes will be skipped"). 

We use modified cycle gates again to represent the bit states of a binary counter: unset bits will 
correspond to modified cycle gates with the state (1,0), set bits to the state (3,2). Setting and 
resetting bits therefore traverses more than one phase, more precisely, from (1, 0) over (2, 0), (3, 0) 
and (3, 1) to (3, 2), and from the latter again over (1, 2) to (1, 0). 



8.4. Modified Construction. In this subsection, we provide the complete construction of the lower 
bound family for the globally optimizing policy. It again consists of a 1-sink x, a modified decelera- 
tion lane of length 6n — 3 that is connected to the two roots s and r, and n modified cycle gates. The 
stubborn cycles of the cycle gates are connected to the r root, the lane base c and to the deceleration 
lane. The modified cycle gates are connected to each other in the same manner as in the original 
lower bound structure for the locally optimizing policy. 

The way the stubborn cycles are connected to the deceleration lane is more involved as in 
the previous lower bound construction. Remember that for all open stubborn cycles, we need to 
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maintain the setting in which two edges point to the deceleration lane while the other points into 
the cycle. We achieve this task by assigning the three nodes of the respective stubborn cycle to the 
input nodes of the deceleration lane in a round-robin fashion. 

We now give the formal construction. The games are denoted by H n = (V n , V n> o, V ni i, E n , O n ). 
The sets of nodes are 

V n :={x, s, c, r} U {aj, t j | < i < 6n — 2}U 
{dl,dj,di,ei,fi,hi,gi,yi,ki | < i < n} 
The players, priorities and edges are described in Table [7] The game H 3 is depicted in Fig- 



ure 



10 However, the edges connecting the cycle gates with the deceleration lane are not included 



in the figure. 



Node 


Player 


Priority 


Successors 


h 





8n + 3 


{s, r, c} 


ti>i 





8n + 2i + 1 


{s, r, 




1 


8n + 2i + 2 


{U} 


c 


1 


20n 


{r} 


4 





8i + l 


{s, c,df}U{a 3j+3 \j<2i-2} 







8« + 3 


{df} U {a 3j+2 \j<2i-2} 


4 





8i + 5 


{d} U {a 3j+1 1 j < 2i - 1} 


&i 


1 


8i + 6 


{dj, hi} 


Vi 





8i + 7 


{/i) ki} 


9i 





8i + 8 


{yu h} 


hi 





20n + ii + 3 


{x} U {gj \i < j <n} 


h 


1 


20n + M + 5 




hi 


1 


20n + M + 6 


ih} 


s 





20n + 2 


{fj | j < n} U {x} 


r 





20n + 4 


{gj | j < n} U {x} 


X 


1 


1 


{x} 



Table 7: Lower Bound Construction for the Globally Optimizing Policy 



Fact 8.1. The game H n has 21 • n nodes, 3.5 • n 2 + 40.5 • n — 4 edges and 24 • n + 6 as highest 
priority. In particular, \H n \ = 0(n 2 ). 

As an initial strategy we select the following strategy lh u ■ Again, it corresponds to a global 
counter setting in which no bit has been set. 

ti?„(*l) = c t-H n (h<i<3) = U-1 LH n (ti>s) = r 

LH n (c) = r i Hn (dj) = d 2 i Hn (df) = a 2 

m n (df) = «i l Hn (gi) = h i Hn (yi) = h 

LH n (h) = x L Hn (s) = x L Hn (r) = x 

It is easy to see that the H n family again is a family of 1-sink games. 



32 



O. FRIEDMANN 




AN EXPONENTIAL LOWER BOUND FOR THE LATEST DETERMINISTIC STRATEGY ITERATION ALGORITHMS 33 



~\ 1 1 1 r 

Locally Optimizing Policy + 
Globally Optimizing Policy x 



+ 

x 

+ X 
+ X 

+ X ^ 

+ X 
+ X 

+ X 
X 

I I I I I 

6 8 10 12 14 

G[n]; linear scale 

Figure 1 1 : Empirical Evaluation 
Lemma 8.3. Let n > 0. 

(1) The game H n is completely won by player 1. 

(2) x is the I -sink of H n and the cycle component ofE iH (w) equals xfor all w. 

Again, we note that it is possible to refine the family H n in such a way that it only comprises a 
lineal - number of edges and only outdegree two. 

8.5. Remarks. The way to prove the construction corrects runs almost exactly the same as for the 
locally optimizing policy. Every global counting step is separated into some counting iterations of 
the deceleration lane with busy updating of the open stubborn cycles of the cycle gates until the least 
significant open cycle closes. Then, resetting of the lane, reopening of lower cycles and alignment 
of connecting edges is carried out. 

Theorem 8.4. Let n > 0. The Strategy Improvement Algorithm with the X glc -policy requires at 
least 2 n improvement steps on H n starting with iu n . 

Our publicly available PGSOLVER Collection [FL09a] of parity game solvers contains imple- 
mentations of strategy iteration, and particularly parameterizations with the locally and globally 
optimizing policy. Additionally, the platform features a number of game generators, including all 
the games and extensions that are presented here. Benchmarking both strategy iteration variants 
with our lower bound constructions results in exponential run-time behavior as can be seen in Fig- 
ure E] 
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9. Mean Payoff, Discounted Payoff and Simple Stochastic Games 

We now show that the standard reductions HPur95l [ZP96 ] from parity games to mean payoff, dis- 
counted payoff as well as simple stochastic games can be used to derive worst-case families for all 
the other game classes. 

A mean payoff game is a tuple G = (V, Vo, Vi, E, r) where V, Vo, V\ and E are as in the 
definition of parity games and r : V — > M. is the so-called reward function. A discounted payoff 
game is a tuple G = (V, Vo, V\,E, r, f$) where (V, Vb, Vi, E, r) is a mean payoff game and < f3 < 
1 is the so-called discount factor. Whenever we do not want to distinguish between a discounted 
and a mean payoff game, we simply write payoff game. 

Strategies and plays are defined exactly the same as in the definition of parity games. Given 
a play tt, the payoff of the play Rg{k) is defined as follows. For a mean payoff game G = 
(V,Vo,Vi,£,r),itis 

1 n 

R g (tt) : = lim inf - } r(ir k ) 

k=0 

and in the case of a discounted payoff game G = (V, Vb, Vi, E, r, (3), it is 

oo 

Ro(n) :=^/3 fc -r(7r fc ) 

fc=0 

Let G be a payoff game. For a given node v, a player strategy a and a player 1 strategy q, let 
t^v,(j,q denote the unique play that starts in v and conforms to a and q. We say that a node v has a 
value iff sup CT inf Rg{^v,ct,q) an d inf e sup^ Rg{kv,o,q) exist, and 

sup inf RG{^v,a,g) = inf sup R g (-k V (T (! ) 

cr ' Q a 

Whenever a node v has a value, we write $g(v) '■= sup^ inf^ RG(^v,a,g) to refer to it. If every node 
has a value, we say that a player strategy a is optimal iff inf e RG(^v,a,g) > inf ^ Rg^v^' ,g) for 
every node v and every player strategy a' and similarly for player 1 . 

Theorem 9.1 ([EM79 ]). Let G be a mean payoff game. Every node v has a value and there are 
optimal positional strategies a and q s.t. i?g(u) = Rg^v^q) far every v. 

Note that given two optimal positional strategies, it is fairly easy to compute the associated 
values. 

Parity Games can be easily polynomial-time reduced to mean payoff games s.t. optimal strate- 
gies correspond to winning strategies and the values of the nodes directly induce corresponding 
winning sets in the original parity game. Given a parity game G = (V, Vq, V±,E, CI), the G-induced 
mean payoff game IndMPG(G) = (V, Vo, V%, E, rn) operates on the same graph and defines the 
reward function vq as follows. 

r n :v^(-\V\f^ 

Theorem 9.2 ([Pur95 ]). Let G be a parity game and let a and q be optimal positional strategies 
w.r.t. IndMPG(G). Then the following holds. 

(1) Wo = {v eV\ "&indMPG(G)( v ) — 0} iJ the G-winning set of player 

(2) W\ ={v eV \ "&indMPG(G)( v ) < 0} i s ^e G-winning set of player 1 

(3) a is a G-winning strategy for player on Wo 

(4) q is a G-winning strategy for player 1 on W\ 
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Mean payoff games can be reduced to discounted payoff games by specifying a discount factor 
that is sufficiently close to 1. Given a mean payoff game G = (V, Vq, V\, E, r), the G-induced 
discounted payoff game IndDPG(G) = (V, Vq, V\,E, r, @g) operates on the same graph with the 
same reward function and defines the discount factor 0g as follows. We assume that r : V — > Z. 



4 • \V\ 3 • max{|r(t;)| | v G V} 

Every parity game G obviously also induces a discounted payoff game via an intermediate mean 
payoff game. 

Let v be a node in a mean payoff game G, let $(y) be the value of v in G, and let $p(v) be the 
value of v in IndDPG(G). Zwick and Paterson [ZP96] show that the value $(v) can be essentially 
bounded by fip(v), i.e. \dp(v) — -d(v)\ < 2\v^{i-Pc) ' ^ choosing (3 > /3q, it follows that 
can be obtained from ^/s(v) by rounding to the nearest rational with a denominator less than \ V\. It 
follows that optimal strategies in an induced discouned payoff game coincide with optimal strategies 
in the original mean payoff game. 

Theorem 9.3 ([ZP96]). Let G be a mean payoff game and let a and g be optimal positional strate- 
gies w.r.t. IndDPG{G). Then a and g are also optimal positional strategies w.r.t. G. 

Let G be a discounted payoff game. Every player strategy a induces an optimal (not necessar- 
ily unique) counterstrategy g a s.t. R G {^v,(t,q) < Rg{^v,ct,q') f° r a U other player 1 strategies g' and 
all nodes v. Note that g a can be computed by solving an LP-problem as described in Algorithm[2] 

Algorithm 2 Computation of Optimal Counter Strategy in a DPG 

Maximize J2v^v ( fi v ) w - rt - 

<p(v) = r(v) + /3 • tp(a(v)) for all v G Vq 

tp(v) < r(v) + (3 ■ <p(u) for all v € V\ and u G v E 



The value assignment <p can be computed in strongly polynomial time by applying the algo- 
rithm of Madani, Thorup and Zwick [MTZ10] for instance. Given ip, an optimal counterstrategy g a 
can be easily induced. 

We say that a strategy a is improvable iff there is a node v G Vq and a node u G vE s.t. 
^(^(iil.ff.jj < ^G( 7r u,o-,e, T )- Again, an improvement policy is a function Xq : So(G) — > Sq(G) 
that satisfies the following two conditions for every strategy a. 

(1) For every node v G V it holds that R G (^a(v),a, ea ) < RG(^i G (a)(v),a, e J- 

(2) If a is improvable then there is a node v G V s.t. RG{^a{v),a, e<7 ) < RG(^i G (a)(v),a, g J- 

As with parity game strategy improvement, it is the case that improving a strategy following 
improvement edges results indeed in an improved strategy. 

Theorem 9.4 ([Pur95 ]). Let G be a discounted payoff game, a be a player strategy and Xq be an 
improvement policy. Then Rg(tt v ,(7,s^) < ^G( 7r i),x G (o-),e I ^) for every node v. If a is not optimal, 
then a is improvable. 

Puri's algorithm for solving discounted payoff games - as well as mean payoff and parity games 
via the standard reductions - starts with an initial strategy iq and runs for a given improvement 
policy Xq as outlined in Algorithm [3] Note that the algorithmic scheme is exactly the same as the 
discrete version for solving parity games. 
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Algorithm 3 Puri's Algorithm for Solving Discounted Payoff Games 

1: <T i Lq 

2: while a is improvable do 
3: a <- T G (a) 

4: end while 

5: return a, g a 



Next, we will show that the strategy iteration for discounted payoff games behaves exactly the 
same as the strategy iteration for 1 -sink-parity games. 



Voge proves in his thesis [ |VogOO] the following theorem that relates parity game strategy iter- 



ation to Puri's Algorithm for solving the induced discounted payoff game. 



Theorem 9.5 ([VogOO]). Let G be a parity game, H = IndDPG(IndMPG{G)) be the induced 
discounted payoff game and a be a player strategy. For every two nodes v and u the following 
holds. 

In other words, every improving switch in the original parity game is also an improving switch 
in the induced discounted payoff game. The reason why this holds true is that by the reduction 
from parity games to mean payoff games, the priorities are mapped to such extremely large rewards 
that the largest reward that occurs on a path dominates all lower ones, the largest reward on a cycle 
dominates all other ones and that the cycle itself dominates all finite paths leading into it. 

Theorem |9.5| is almost what we need to show that strategy iteration for discounted payoff games 
behaves exactly the same on IndDPG(IndMPG(G n )) as the discrete strategy iteration algorithm 
on G n . Essentially, we need to show the conversion which is equivalent to showing 

However, this statement is not true for every parity game. The reason why a run of the strategy 
improvement algorithm on general parity games may differ from a run on the induced discounted 
payoff game is that the parity game strategy iteration does not care about the priority of all nodes 
on its path to the dominating cycle node that are less relevant. In case of 1 -sink-parity games, the 
only occurring dominating cycle node has the least priority in the game, and therefore all priorities 
occurring in paths influence the valuations. Also, the strategy iteration on arbitrary parity games 
does not consider the priorities of all the nodes on a cycle appearing in a node valuation. 

First, we show that optimal player 1 counter strategies in the induced discounted payoff game 
also eventually reach the 1-sink. 

Lemma 9.6. Let G be a 1 -sink-parity game with v* being the 1-sink, H = IndDPG{IndMPG{G)) 
be the induced discounted payoff game, and a be a player strategy s.t. H tG < H<j. Let vq ^ v* be 
an arbitrary node. Then, 'K Vo ,a,g CT is of the following form: 

Proof. Consider the games G' := G\ a and H' := H\ a and note that = r^' as well as = g„' . 
Note that G' is wo n by play er 1 following since G is 1-sink parity game. 



By Theorems 9.2 and 9.3 it follows that g^' must be also a player 1 winning strategy for 
the whole game G . Therefore, it follows that every play Tr V0)(r ga eventually ends in cycle with a 
dominating cycle node w* of odd priority, hence Q(w*) > 0(u*). 
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If ft(w*) > ii(v*), it follows that there is a w* -dominated cycle reachable in G' starting from 
vq. But since E a (vo) = (v*, _, _), this cannot be the case. Hence Q(w*) = £l(v*), implying that 

w* = v*. □ 

Second, we show that the value ordering between two different paths leading to the 1-sink again 
depends solely on the most relevant node in the symmetric difference of the paths. 

Lemma 9.7. Let G be a 1-sink-parity game with v* being the 1-sink, H = IndDPG(IndMPG(G)) 
be the induced discounted payoff game. Let ir and £ be two paths of the form tt = UoUi . . . u\- \ (v* ) w 
and £ = wqW\ . . .Wk-iiv*)^ and let U = {uq, . . . , uz_i} and W = {wo, . . . , Wk-i}. Then 
U <W implies Rh(tt) -< Rh(0- 

Proof. Let V = {vq, . . . , v n -±} s.t. p n -\ > p n -2 > ■ ■ ■ > Po with Pi = Q(vi) and vq = v* , and let 
P be the discount factor of H. W.l.o.g. assume that n > 2 since otherwise both paths are necessarily 
the same. Let a : {1, . . . , n — 1} — > {0, . . . , n — 2, L} be a map s.t. 



a(i) 



j if Uj = Vi 

_L if there is no j s.t. Uj = V{ 



and let b : {1, . . . , n — 1} — > {0, . . . , n — 2, _L} be defined accordingly for Wj. Set (3 1 - := 0. Note 
that the following holds. 

n—l nl n—1 nk 

i=l i=l 

Let m = max{i | (a(i) = _L and b(j) / _L) or (a(i) / _L and b(j) = _L)} and note that m indeed 
is well-defined. Set 

A := R H (0 - R H (n) = A x + A 2 + A 3 + A 4 

where 

n-l 

Ai := Yj ~ P 1 ®) ■ (" n ) K 

i=m+l 

A 2 := {fi Km) - p a{m) ) ■ {-n) Pm 
m—l 

A 3 := ^(/3 fe W -/3 a ») • (-n) ft 
i=l 

n-(/3 fc -^) 
A4 := /3-1 

Regarding Ai, let m < i < n and consider that |/3 6(i) - /3 a(i) | < \1 - /3 n ' 2 \. The following holds. 

71-3 



3=0 



7V 



Pi + 1 



< n • (1 - /3) • n K = - — ^ < n~ 2 



We conclude that I Ai| < n ~ 1 ~ m < 1 
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Regarding A2, note that b(m) 7^ _L implies that p m is even and b{m) = _L implies that p m is 
odd. Let c = b(m) iff b(m) 7^ _L and c = a(m) otherwise. Hence the following holds. 

A 2 = /3 C • n Pm > /3"" 1 • n Pm = (/3 n_1 - 1) • n Pm + n Pm 

n-2 

= E P j ) • (/3 - 1) • n Pm + n Pm > (1 - n) ■ (1 - /9) • + n Pm 

j=0 

1 ti „ _ ^ 3 , ri 



4 . n Pn-i+3 — 4 

Regarding A3, let < i < m and consider that \ j3 b< ^> — /3 a W | < 1. The following holds. 

m—1 m—1 

|A 3 |< ^|/3 b »-/3 a «| 

i=l i=l 
Now we need to distinguish on whether p rn = 2. If so, note that m = 1, b(m) 7^ _L and k = I + 1. 
Hence, regarding A4, the following holds. 

|A4|= = "-^" 

Therefore we conclude (remember that n > 2) 

3 9 

A > A 2 - |Ai| - |A 3 | - |A 4 | >--n-l-0-n>0 

Otherwise, if p rn > 2, it holds that \ j3 k - < |1 - P n ~ l \ < (n - 1) • (1 - 0) and hence 
I A4 1 < n 2 — n. Additionally, consider A3 again. 

m-l p m -l Pm-1 _ 1 

|A 3 | < n Pl < n* = n J - 1 - n = 1 - n 

i=l i=2 i=0 71 

We conclude the following (remember again that n > 2). 

A > A 2 - |Ai| - |A 3 | - |A 4 | 

3 ~ , n Pm - 1 , 2 

> - • n Pm - 1 hl+n-r+n 

4 n-1 

= ? . h p™ - nPm ~ 1 _ ( n _ !)2 + x 
4 n — 1 

3 „ n p ™ , x9 

> - • n Pm (n - 1 2 + 1 

~ 4 2 v 7 



1 

I 



n-l) 2 + l>0 U 



Third, we derive that the strategy iteration for discounted payoff games behaves exactly the same as 
the strategy iteration for 1 -sink-parity games. 

Theorem 9.8. Let G be a 1 -sink-parity game, v be a node, H = IndDPG{IndMPG{G)) be the 
induced discounted payoff game and a be a player strategy s.t. S t(3 <j E. a . Then Q a = r a . 

Proof. Assume by contradiction that q a / r a . Hence, there is a node v s.t. 7r„ jCrjT(J / ^ v .a,Q a - Since 
G is a 1-sink parity game and H 4o < it follows by Lemma 9.6 that i^ v ,a,Q a eventually reaches 
the 1-sink. It follows that Rh{kv,(t,q<,) < Rh^v^,^) which is impossible due to Lemma 9.7 □ 
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Corollary 9.9. Let G be a 1-sink-parity game, H = IndDPG{IndMPG{G)) be the induced dis- 
counted payoff game and a be a player strategy s.t. H tQ < 3 CT . For every two nodes v and u the 
following holds. 

Corollary 9.10. Puri's algorithm for solving payoff games requires exponentially many iterations 
in the worst case when parameterized with the locally or the globally optimal policy. 

We note that it is possible to define strategy iteration for mean payoff games directly, i.e. with- 
out applying the reduction to discounted payoff games first. Unfortunately, with mean payoff games, 
it is not the case that if a is not optimal then there necessarily exists at least one switch that strictly 
improves the reward. There are several way to remedy this situation; most of them are based on a 
lexicographic ordering again with the first component being the reward and the second component 
being a description of the nodes leading to the cycle, usually called potential. We note without proof 
that our results translate to this variant of strategy iteration as well. 

Finally, we relate our results to simple stochastic games. Particularly, we consider simple sto- 
chastic games with arbitrary outdegree and arbitrary probabilities that halt almost surely. Zwick, 
Paterson and Condon show that there is direct correspondence between this version of simple sto- 
chastic games and the original one [ZP96l lCon92l . 

A simple stochastic game is a tuple G = (V, V m in, V max , V avg , 0, 1, E,p) s.t. V m in, V max , 
Vavg, {0} and {1} are a partition of V, (V, E) is a directed graph with exactly two sinks and 1, 
and p : En (V avg x^)-> [0; 1] is the probability mapping s.t. Yl u &vE P( v i u ) = 1 f° r au v e V avg . 

We say that a simple stochastic game halts with probability 1 iff every node v in G\ a , T has a 
path with non-negligible probabilities to a sink for every pair of strategies a and r. Every simple 
stochastic game can be reduced to an equivalent simple stochastic game that halts with probability 
1 in polynomial time HCon921 . We assume from now on that every given simple stochastic game 
halts with priority 1. 

Given a simple stochastic game and a play tt, we say that player Max wins 7r iff it ends in 
the 1-sink and similarly that player Min wins it if it ends in the 0-sink. Let Ra(v,a, g) denote 
the probability that player Max wins starting from v conforming to the Max-strategy a and the 
Mm-strategy g. 

We say that a node v has a value iff sup CT inf p Rg{^v,cj,q) and inf e sup^ Rg{^v,(t,q) exist, and 
supinf Rg{^vjj,q) = inf sup #0(71-^5) 

cr Q 8 a 

Whenever a node v has a value, we write "9g{v) '■= sup^ inf^ Rci^v.a^) to refer to it. If every node 
has a value, we say that a player strategy a is optimal iff inf e Rci^v^^) > inf e Rg(^v,ct' ,q) for 
every node v and every player strategy a' and similarly for player 1 . 

Theorem 9.11 ([Con92]). Let G be a simple stochastic game. Every node v has a value and there 
are optimal positional strategies a and g s.t. #g(u) = Rq{it v a e ) for every v. 

Again, simple stochastic games can be solved by strategy iteration. Given a player Max strat- 
egy a, an (not necessarily unique) optimal counterstrategy g a - i.e. Rg{v, cr, g) < Rg{v, a, g') for 
all other player Min strategies g' and all nodes v - can be computed by solving an LP-problem as 
described in Algorithm [4] 

The value assignment ip can be computed in polynomial time by applying Khachiyan's algo- 
rithm [ Kha79 ] for instance. Given tp, an optimal counterstrategy g a can be efficiently deduced. The 
strategy iteration that solves the simple stochastic games runs exactly the same as for discounted 
payoff games. 
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Algorithm 4 Computation of Optimal Counter Strategy in a SSG 



Maximize YlveV f( v ) WJ -t- 

<p(v) = <p(<r(v)) for all v G V max 

<p(v) < tp(u) for all v G V m i n and u G vE 



</>(!) = ! 
p(0) = 



Zwick and Paterson [ZP96] describe a simple reduction from discounted payoff games to sim- 
ple stochastic games that halt with probability 1. Let G = (V, Vo, Vi, E, r, f5) be a discounted payoff 
game and let I = mm{r(v) \ v G V}, u = max{r(u) | v G V} and d = max(l, u — I). 

The G -induced simple stochastic game is the game IndSSG(G) = [V' , V^j n , V ma x, V aV g, 0, 1, E',p) 
where F mi „ = Vi, V max = V , V aV g = E,V = V min U V max U V at , 5 U {0, 1} and 

= {(u, (v,u)), ((v,u),u), ((v,u),0), ((v,u), 1) | v G V" and u G E 1 } 
it) i ^ /3 

p: J ((«,«), 1)^(1-/3)-^ 

k (( w ,u),0)^(l-^j-(l-^) 

Clearly, the induced simple stochastic game halts with probability 1 . As Zwick and Paterson pointed 
out, the values of the induced simple stochastic game directly correspond to the values of the original 
discounted payoff game. 

Lemma 9.12 ([ZP96]). Let G be a discounted payoff game and G 1 = IndSSG{G). Let a be a 
player strategy and gbe a player 1 strategy. Then (1 — j3) ■ Ra(,^v,a,g) = d ■ Rg'(v, a, g) + I for 
every node v where I = min{r(t> ) | v G V}, u = max{r(t> ) | v G V} and d = max(l, u — I). 

This particularly implies that Rg^v^^q) = fzrg ■ Re i v > o~, g) + 1 with > 0, i.e. the values 
of the original discounted payoff game correspond to the values of the induced simple stochastic 
game by an affme transformation that preserves the ordering. 

Corollary 9.13. The standard strategy iteration for simple stochastic games requires exponentially 
many iterations in the worst case. 



10. Conclusion 

We have presented a family of games on which the deterministic strategy improvement algorithm 
for parity games requires exponentially many iterations. Additionally, we have shown how to adapt 
this family to prove an exponential lower bound on Schewe's policy. 

Finally, we have shown that the presented family can be used to transfer the exponential lower 
bound to mean payoff, discounted payoff and simple stochastic games by applying the standard 
reductions. 

Although there are many preprocessing techniques that could be used to simplify the family 
of games presented here - e.g. decomposition into strongly connected components, compression 
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of priorities, direct-solving of simple cycles, see [FL09b] for instance - they are no solution to 
the general weakness of strategy iteration on these games, simply due to the fact that all known 
preprocessing techniques can be fooled quite easily without really touching the inner structure of 
the games. 

Parity games are widely believed to be solvable in polynomial time, yet there is no algorithm 
known that is performing better than superpolynomially. Jurdzihski and Voge presented the strategy 
iteration technique for parity games over ten years ago, and this class of solving procedures is 
generally supposed to be the best candidate to give rise to an algorithm that solves parity games in 
polynomial time since then. Unfortunately, the locally and the globally optimizing technique are 
not capable of achieving this goal. 

We think that the strategy iteration still is a promising candidate for a polynomial time algo- 
rithm, however it may be necessary to alter more of it than just the improvement policy. 
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